ipd-package: ipd: Inference on Predicted Data
In ipd: Inference on Predicted Data

ipd-package

R Documentation

ipd: Inference on Predicted Data

Description

Performs valid statistical inference on predicted data (IPD) using recent methods, where for a subset of the data, the outcomes have been predicted by an algorithm. Provides a wrapper function with specified defaults for the type of model and method to be used for estimation and inference. Further provides methods for tidying and summarizing results. Salerno et al., (2024) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2410.09665")}.

The ipd package provides tools for statistical modeling and inference when a significant portion of the outcome data is predicted by AI/ML algorithms. It implements several state-of-the-art methods for inference on predicted data (IPD), offering a user-friendly interface to facilitate their use in real-world applications.

Details

This package is particularly useful in scenarios where predicted values (e.g., from machine learning models) are used as proxies for unobserved outcomes, which can introduce biases in estimation and inference. The ipd package integrates methods designed to address these challenges.

Features

Multiple IPD methods: PostPI, PPI, ⁠PPI++⁠, and PSPA currently.
Flexible wrapper functions for ease of use.
Custom methods for model inspection and evaluation.
Seamless integration with common data structures in R.
Comprehensive documentation and examples.

Key Functions

ipd: Main wrapper function which implements various methods for inference on predicted data for a specified model/outcome type (e.g., mean estimation, linear regression).
simdat: Simulates data for demonstrating the use of the various IPD methods.
print.ipd: Prints a brief summary of the IPD method/model combination.
summary.ipd: Summarizes the results of fitted IPD models.
tidy.ipd: Tidies the IPD method/model fit into a data frame.
glance.ipd: Glances at the IPD method/model fit, returning a one-row summary.
augment.ipd: Augments the data used for an IPD method/model fit with additional information about each observation.

Documentation

The package includes detailed documentation for each function, including usage examples. A vignette is also provided to guide users through common workflows and applications of the package.

References

For details on the statistical methods implemented in this package, please refer to the associated manuscripts at the following references:

PostPI: Wang, S., McCormick, T. H., & Leek, J. T. (2020). Methods for correcting inference based on outcomes predicted by machine learning. Proceedings of the National Academy of Sciences, 117(48), 30266-30275.
PPI: Angelopoulos, A. N., Bates, S., Fannjiang, C., Jordan, M. I., & Zrnic, T. (2023). Prediction-powered inference. Science, 382(6671), 669-674.
PPI++: Angelopoulos, A. N., Duchi, J. C., & Zrnic, T. (2023). PPI++: Efficient prediction-powered inference. arXiv preprint arXiv:2311.01453.
PSPA: Miao, J., Miao, X., Wu, Y., Zhao, J., & Lu, Q. (2023). Assumption-lean and data-adaptive post-prediction inference. arXiv preprint arXiv:2311.14220.

Author(s)

Maintainer: Stephen Salerno ssalerno@fredhutch.org (ORCID) [copyright holder]

Authors:

Jiacheng Miao jmiao24@wisc.edu
Awan Afiaz aafiaz@uw.edu
Kentaro Hoffman khoffm3@uw.edu
Anna Neufeld acn2@williams.edu
Qiongshi Lu qlu@biostat.wisc.edu
Tyler H McCormick tylermc@uw.edu
Jeffrey T Leek jtleek@fredhutch.org

Examples

#-- Generate Example Data

set.seed(12345)

dat <- simdat(n = c(300, 300, 300), effect = 1, sigma_Y = 1)

head(dat)

formula <- Y - f ~ X1

#-- PostPI Analytic Correction (Wang et al., 2020)

fit_postpi1 <- ipd(formula, method = "postpi_analytic", model = "ols",

    data = dat, label = "set_label")

#-- PostPI Bootstrap Correction (Wang et al., 2020)

nboot <- 200

fit_postpi2 <- ipd(formula, method = "postpi_boot", model = "ols",

    data = dat, label = "set_label", nboot = nboot)

#-- PPI (Angelopoulos et al., 2023)

fit_ppi <- ipd(formula, method = "ppi", model = "ols",

    data = dat, label = "set_label")

#-- PPI++ (Angelopoulos et al., 2023)

fit_plusplus <- ipd(formula, method = "ppi_plusplus", model = "ols",

    data = dat, label = "set_label")

#-- PSPA (Miao et al., 2023)

fit_pspa <- ipd(formula, method = "pspa", model = "ols",

    data = dat, label = "set_label")

#-- Print the Model

print(fit_postpi1)

#-- Summarize the Model

summ_fit_postpi1 <- summary(fit_postpi1)

#-- Print the Model Summary

print(summ_fit_postpi1)

#-- Tidy the Model Output

tidy(fit_postpi1)

#-- Get a One-Row Summary of the Model

glance(fit_postpi1)

#-- Augment the Original Data with Fitted Values and Residuals

augmented_df <- augment(fit_postpi1)

head(augmented_df)

ipd documentation built on April 4, 2025, 4:41 a.m.