Package 'POPInf'

Title: Assumption-Lean and Data-Adaptive Post-Prediction Inference
Description: Implementation of assumption-lean and data-adaptive post-prediction inference (POPInf), for valid and efficient statistical inference based on data predicted by machine learning. See Miao, Miao, Wu, Zhao, and Lu (2023) <arXiv:2311.14220>.
Authors: Jiacheng Miao [aut, cre]
Maintainer: Jiacheng Miao <[email protected]>
License: GPL-3
Version: 1.0.0
Built: 2024-08-19 02:30:28 UTC
Source: https://github.com/qlu-lab/popinf

Help Index


Calculation of the matrix A based on single dataset

Description

A function for the calculation of the matrix A based on single dataset

Usage

A(X, Y, quant = NA, theta, method)

Arguments

X

Array or DataFrame containing covariates

Y

Array or DataFrame of outcomes

quant

quantile for quantile estimation

theta

parameter theta

method

indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".

Value

matrix A based on single dataset


Initial estimation

Description

est_ini function for initial estimation

Usage

est_ini(X, Y, quant = NA, method)

Arguments

X

Array or DataFrame containing covariates

Y

Array or DataFrame of outcomes

quant

quantile for quantile estimation

method

indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".

Value

initial estimatior


Sample expectation of psi

Description

mean_psi function for sample expectation of psi

Usage

mean_psi(X, Y, theta, quant = NA, method)

Arguments

X

Array or DataFrame containing covariates

Y

Array or DataFrame of outcomes

theta

parameter theta

quant

quantile for quantile estimation

method

indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".

Value

sample expectation of psi


Sample expectation of POP-Inf psi

Description

mean_psi_pop function for sample expectation of POP-Inf psi

Usage

mean_psi_pop(
  X_lab,
  X_unlab,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  w,
  theta,
  quant = NA,
  method
)

Arguments

X_lab

Array or DataFrame containing observed covariates in labeled data.

X_unlab

Array or DataFrame containing observed or predicted covariates in unlabeled data.

Y_lab

Array or DataFrame of observed outcomes in labeled data.

Yhat_lab

Array or DataFrame of predicted outcomes in labeled data.

Yhat_unlab

Array or DataFrame of predicted outcomes in unlabeled data.

w

weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates).

theta

parameter theta

quant

quantile for quantile estimation

method

indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".

Value

sample expectation of POP-Inf psi


Gradient descent for obtaining estimator

Description

optim_est function for gradient descent for obtaining estimator

Usage

optim_est(
  X_lab,
  X_unlab,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  w,
  theta,
  quant = NA,
  method,
  step_size = 0.1,
  max_iterations = 500,
  convergence_threshold = 1e-06
)

Arguments

X_lab

Array or DataFrame containing observed covariates in labeled data.

X_unlab

Array or DataFrame containing observed or predicted covariates in unlabeled data.

Y_lab

Array or DataFrame of observed outcomes in labeled data.

Yhat_lab

Array or DataFrame of predicted outcomes in labeled data.

Yhat_unlab

Array or DataFrame of predicted outcomes in unlabeled data.

w

weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates).

theta

parameter theta

quant

quantile for quantile estimation

method

indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".

step_size

step size for gradient descent

max_iterations

maximum of iterations for gradient descent

convergence_threshold

convergence threshold for gradient descent

Value

estimator


Gradient descent for obtaining the weight vector

Description

optim_weights function for gradient descent for obtaining estimator

Usage

optim_weights(
  j,
  X_lab,
  X_unlab,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  w,
  theta,
  quant = NA,
  method
)

Arguments

j

j-th coordinate of weights vector

X_lab

Array or DataFrame containing observed covariates in labeled data.

X_unlab

Array or DataFrame containing observed or predicted covariates in unlabeled data.

Y_lab

Array or DataFrame of observed outcomes in labeled data.

Yhat_lab

Array or DataFrame of predicted outcomes in labeled data.

Yhat_unlab

Array or DataFrame of predicted outcomes in unlabeled data.

w

weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates).

theta

parameter theta

quant

quantile for quantile estimation

method

indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".

Value

weights


POP-Inf M-Estimation

Description

pop_M function conducts post-prediction M-Estimation.

Usage

pop_M(
  X_lab = NA,
  X_unlab = NA,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  alpha = 0.05,
  weights = NA,
  max_iterations = 100,
  convergence_threshold = 0.05,
  quant = NA,
  intercept = FALSE,
  focal_index = NA,
  method
)

Arguments

X_lab

Array or DataFrame containing observed covariates in labeled data.

X_unlab

Array or DataFrame containing observed or predicted covariates in unlabeled data.

Y_lab

Array or DataFrame of observed outcomes in labeled data.

Yhat_lab

Array or DataFrame of predicted outcomes in labeled data.

Yhat_unlab

Array or DataFrame of predicted outcomes in unlabeled data.

alpha

Specifies the confidence level as 1 - alpha for confidence intervals.

weights

weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates).

max_iterations

Sets the maximum number of iterations for the optimization process to derive weights.

convergence_threshold

Sets the convergence threshold for the optimization process to derive weights.

quant

quantile for quantile estimation

intercept

Boolean indicating if the input covariates' data contains the intercept (TRUE if the input data contains)

focal_index

Identifies the focal index for variance reduction.

method

indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".

Value

A summary table presenting point estimates, standard error, confidence intervals (1 - alpha), P-values, and weights.

Examples

data <- sim_data()
X_lab <- data$X_lab
X_unlab <- data$X_unlab
Y_lab <- data$Y_lab
Yhat_lab <- data$Yhat_lab
Yhat_unlab <- data$Yhat_unlab
pop_M(Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab,
      alpha = 0.05, method = "mean")
pop_M(Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab,
      alpha = 0.05, quant = 0.75, method = "quantile")
pop_M(X_lab = X_lab, X_unlab = X_unlab,
      Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab,
      alpha = 0.05, method = "ols")

Esimating equation

Description

psi function for esimating equation

Usage

psi(X, Y, theta, quant = NA, method)

Arguments

X

Array or DataFrame containing covariates

Y

Array or DataFrame of outcomes

theta

parameter theta

quant

quantile for quantile estimation

method

indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".

Value

esimating equation


Variance-covariance matrix of the estimation equation

Description

Sigma_cal function for variance-covariance matrix of the estimation equation

Usage

Sigma_cal(
  X_lab,
  X_unlab,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  w,
  theta,
  quant = NA,
  A_lab_inv,
  A_unlab_inv,
  method
)

Arguments

X_lab

Array or DataFrame containing observed covariates in labeled data.

X_unlab

Array or DataFrame containing observed or predicted covariates in unlabeled data.

Y_lab

Array or DataFrame of observed outcomes in labeled data.

Yhat_lab

Array or DataFrame of predicted outcomes in labeled data.

Yhat_unlab

Array or DataFrame of predicted outcomes in unlabeled data.

w

weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates).

theta

parameter theta

quant

quantile for quantile estimation

A_lab_inv

Inverse of matrix A using labeled data

A_unlab_inv

Inverse of matrix A using unlabeled data

method

indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson".

Value

variance-covariance matrix of the estimation equation


Simulate the data for testing the functions

Description

sim_data function for the calculation of the matrix A

Usage

sim_data(r = 0.9, binary = FALSE)

Arguments

r

imputation correlation

binary

simulate binary outcome or not

Value

simulated data