Title: | Assumption-Lean and Data-Adaptive Post-Prediction Inference |
---|---|
Description: | Implementation of assumption-lean and data-adaptive post-prediction inference (POPInf), for valid and efficient statistical inference based on data predicted by machine learning. See Miao, Miao, Wu, Zhao, and Lu (2023) <arXiv:2311.14220>. |
Authors: | Jiacheng Miao [aut, cre] |
Maintainer: | Jiacheng Miao <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2024-08-19 02:30:28 UTC |
Source: | https://github.com/qlu-lab/popinf |
A
function for the calculation of the matrix A based on single dataset
A(X, Y, quant = NA, theta, method)
A(X, Y, quant = NA, theta, method)
X |
Array or DataFrame containing covariates |
Y |
Array or DataFrame of outcomes |
quant |
quantile for quantile estimation |
theta |
parameter theta |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
matrix A based on single dataset
est_ini
function for initial estimation
est_ini(X, Y, quant = NA, method)
est_ini(X, Y, quant = NA, method)
X |
Array or DataFrame containing covariates |
Y |
Array or DataFrame of outcomes |
quant |
quantile for quantile estimation |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
initial estimatior
link_grad
function for gradient of the link function
link_grad(t, method)
link_grad(t, method)
t |
t |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
gradient of the link function
link_Hessian
function for Hessians of the link function
link_Hessian(t, method)
link_Hessian(t, method)
t |
t |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
Hessians of the link function
mean_psi
function for sample expectation of psi
mean_psi(X, Y, theta, quant = NA, method)
mean_psi(X, Y, theta, quant = NA, method)
X |
Array or DataFrame containing covariates |
Y |
Array or DataFrame of outcomes |
theta |
parameter theta |
quant |
quantile for quantile estimation |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
sample expectation of psi
mean_psi_pop
function for sample expectation of POP-Inf psi
mean_psi_pop( X_lab, X_unlab, Y_lab, Yhat_lab, Yhat_unlab, w, theta, quant = NA, method )
mean_psi_pop( X_lab, X_unlab, Y_lab, Yhat_lab, Yhat_unlab, w, theta, quant = NA, method )
X_lab |
Array or DataFrame containing observed covariates in labeled data. |
X_unlab |
Array or DataFrame containing observed or predicted covariates in unlabeled data. |
Y_lab |
Array or DataFrame of observed outcomes in labeled data. |
Yhat_lab |
Array or DataFrame of predicted outcomes in labeled data. |
Yhat_unlab |
Array or DataFrame of predicted outcomes in unlabeled data. |
w |
weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). |
theta |
parameter theta |
quant |
quantile for quantile estimation |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
sample expectation of POP-Inf psi
optim_est
function for gradient descent for obtaining estimator
optim_est( X_lab, X_unlab, Y_lab, Yhat_lab, Yhat_unlab, w, theta, quant = NA, method, step_size = 0.1, max_iterations = 500, convergence_threshold = 1e-06 )
optim_est( X_lab, X_unlab, Y_lab, Yhat_lab, Yhat_unlab, w, theta, quant = NA, method, step_size = 0.1, max_iterations = 500, convergence_threshold = 1e-06 )
X_lab |
Array or DataFrame containing observed covariates in labeled data. |
X_unlab |
Array or DataFrame containing observed or predicted covariates in unlabeled data. |
Y_lab |
Array or DataFrame of observed outcomes in labeled data. |
Yhat_lab |
Array or DataFrame of predicted outcomes in labeled data. |
Yhat_unlab |
Array or DataFrame of predicted outcomes in unlabeled data. |
w |
weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). |
theta |
parameter theta |
quant |
quantile for quantile estimation |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
step_size |
step size for gradient descent |
max_iterations |
maximum of iterations for gradient descent |
convergence_threshold |
convergence threshold for gradient descent |
estimator
optim_weights
function for gradient descent for obtaining estimator
optim_weights( j, X_lab, X_unlab, Y_lab, Yhat_lab, Yhat_unlab, w, theta, quant = NA, method )
optim_weights( j, X_lab, X_unlab, Y_lab, Yhat_lab, Yhat_unlab, w, theta, quant = NA, method )
j |
j-th coordinate of weights vector |
X_lab |
Array or DataFrame containing observed covariates in labeled data. |
X_unlab |
Array or DataFrame containing observed or predicted covariates in unlabeled data. |
Y_lab |
Array or DataFrame of observed outcomes in labeled data. |
Yhat_lab |
Array or DataFrame of predicted outcomes in labeled data. |
Yhat_unlab |
Array or DataFrame of predicted outcomes in unlabeled data. |
w |
weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). |
theta |
parameter theta |
quant |
quantile for quantile estimation |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
weights
pop_M
function conducts post-prediction M-Estimation.
pop_M( X_lab = NA, X_unlab = NA, Y_lab, Yhat_lab, Yhat_unlab, alpha = 0.05, weights = NA, max_iterations = 100, convergence_threshold = 0.05, quant = NA, intercept = FALSE, focal_index = NA, method )
pop_M( X_lab = NA, X_unlab = NA, Y_lab, Yhat_lab, Yhat_unlab, alpha = 0.05, weights = NA, max_iterations = 100, convergence_threshold = 0.05, quant = NA, intercept = FALSE, focal_index = NA, method )
X_lab |
Array or DataFrame containing observed covariates in labeled data. |
X_unlab |
Array or DataFrame containing observed or predicted covariates in unlabeled data. |
Y_lab |
Array or DataFrame of observed outcomes in labeled data. |
Yhat_lab |
Array or DataFrame of predicted outcomes in labeled data. |
Yhat_unlab |
Array or DataFrame of predicted outcomes in unlabeled data. |
alpha |
Specifies the confidence level as 1 - alpha for confidence intervals. |
weights |
weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). |
max_iterations |
Sets the maximum number of iterations for the optimization process to derive weights. |
convergence_threshold |
Sets the convergence threshold for the optimization process to derive weights. |
quant |
quantile for quantile estimation |
intercept |
Boolean indicating if the input covariates' data contains the intercept (TRUE if the input data contains) |
focal_index |
Identifies the focal index for variance reduction. |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
A summary table presenting point estimates, standard error, confidence intervals (1 - alpha), P-values, and weights.
data <- sim_data() X_lab <- data$X_lab X_unlab <- data$X_unlab Y_lab <- data$Y_lab Yhat_lab <- data$Yhat_lab Yhat_unlab <- data$Yhat_unlab pop_M(Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab, alpha = 0.05, method = "mean") pop_M(Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab, alpha = 0.05, quant = 0.75, method = "quantile") pop_M(X_lab = X_lab, X_unlab = X_unlab, Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab, alpha = 0.05, method = "ols")
data <- sim_data() X_lab <- data$X_lab X_unlab <- data$X_unlab Y_lab <- data$Y_lab Yhat_lab <- data$Yhat_lab Yhat_unlab <- data$Yhat_unlab pop_M(Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab, alpha = 0.05, method = "mean") pop_M(Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab, alpha = 0.05, quant = 0.75, method = "quantile") pop_M(X_lab = X_lab, X_unlab = X_unlab, Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab, alpha = 0.05, method = "ols")
psi
function for esimating equation
psi(X, Y, theta, quant = NA, method)
psi(X, Y, theta, quant = NA, method)
X |
Array or DataFrame containing covariates |
Y |
Array or DataFrame of outcomes |
theta |
parameter theta |
quant |
quantile for quantile estimation |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
esimating equation
Sigma_cal
function for variance-covariance matrix of the estimation equation
Sigma_cal( X_lab, X_unlab, Y_lab, Yhat_lab, Yhat_unlab, w, theta, quant = NA, A_lab_inv, A_unlab_inv, method )
Sigma_cal( X_lab, X_unlab, Y_lab, Yhat_lab, Yhat_unlab, w, theta, quant = NA, A_lab_inv, A_unlab_inv, method )
X_lab |
Array or DataFrame containing observed covariates in labeled data. |
X_unlab |
Array or DataFrame containing observed or predicted covariates in unlabeled data. |
Y_lab |
Array or DataFrame of observed outcomes in labeled data. |
Yhat_lab |
Array or DataFrame of predicted outcomes in labeled data. |
Yhat_unlab |
Array or DataFrame of predicted outcomes in unlabeled data. |
w |
weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). |
theta |
parameter theta |
quant |
quantile for quantile estimation |
A_lab_inv |
Inverse of matrix A using labeled data |
A_unlab_inv |
Inverse of matrix A using unlabeled data |
method |
indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". |
variance-covariance matrix of the estimation equation
sim_data
function for the calculation of the matrix A
sim_data(r = 0.9, binary = FALSE)
sim_data(r = 0.9, binary = FALSE)
r |
imputation correlation |
binary |
simulate binary outcome or not |
simulated data