ppi_plusplus_logistic: PPI++ Logistic Regression
In ipd: Inference on Predicted Data

ppi_plusplus_logistic

R Documentation

PPI++ Logistic Regression

Description

Helper function for PPI++ logistic regression

Usage

ppi_plusplus_logistic(
  X_l,
  Y_l,
  f_l,
  X_u,
  f_u,
  lhat = NULL,
  coord = NULL,
  opts = NULL,
  w_l = NULL,
  w_u = NULL
)

Arguments

`X_l`	(matrix): n x p matrix of covariates in the labeled data.
`Y_l`	(vector): n-vector of labeled outcomes.
`f_l`	(vector): n-vector of predictions in the labeled data.
`X_u`	(matrix): N x p matrix of covariates in the unlabeled data.
`f_u`	(vector): N-vector of predictions in the unlabeled data.
`lhat`	(float, optional): Power-tuning parameter (see https://arxiv.org/abs/2311.01453). The default value, `NULL`, will estimate the optimal value from the data. Setting `lhat = 1` recovers PPI with no power tuning, and setting `lhat = 0` recovers the classical point estimate.
`coord`	(int, optional): Coordinate for which to optimize `lhat = 1`. If `NULL`, it optimizes the total variance over all coordinates. Must be in (1, ..., d) where d is the dimension of the estimand.
`opts`	(list, optional): Options to pass to the optimizer. See ?optim for details.
`w_l`	(ndarray, optional): Sample weights for the labeled data set. Defaults to a vector of ones.
`w_u`	(ndarray, optional): Sample weights for the unlabeled data set. Defaults to a vector of ones.

Details

PPI++: Efficient Prediction Powered Inference (Angelopoulos et al., 2023) https://arxiv.org/abs/2311.01453

Value

(list): A list containing the following:

est: (vector): vector of PPI++ logistic regression coefficient estimates.
se: (vector): vector of standard errors of the coefficients.
lambda: (float): estimated power-tuning parameter.
rectifier_est: (vector): vector of the rectifier logistic regression coefficient estimates.
var_u: (matrix): covariance matrix for the gradients in the unlabeled data.
var_l: (matrix): covariance matrix for the gradients in the labeled data.
grads: (matrix): matrix of gradients for the labeled data.
grads_hat_unlabeled: (matrix): matrix of predicted gradients for the unlabeled data.
grads_hat: (matrix): matrix of predicted gradients for the labeled data.
inv_hessian: (matrix): inverse Hessian matrix.

Examples


dat <- simdat(model = "logistic")

form <- Y - f ~ X1

X_l <- model.matrix(form, data = dat[dat$set_label == "labeled",])

Y_l <- dat[dat$set_label == "labeled", all.vars(form)[1]] |> matrix(ncol = 1)

f_l <- dat[dat$set_label == "labeled", all.vars(form)[2]] |> matrix(ncol = 1)

X_u <- model.matrix(form, data = dat[dat$set_label == "unlabeled",])

f_u <- dat[dat$set_label == "unlabeled", all.vars(form)[2]] |> matrix(ncol = 1)

ppi_plusplus_logistic(X_l, Y_l, f_l, X_u, f_u)

ipd documentation built on April 4, 2025, 4:41 a.m.