ipd-package | R Documentation |
Performs valid statistical inference on predicted data (IPD) using recent methods, where for a subset of the data, the outcomes have been predicted by an algorithm. Provides a wrapper function with specified defaults for the type of model and method to be used for estimation and inference. Further provides methods for tidying and summarizing results. Salerno et al., (2024) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2410.09665")}.
The ipd
package provides tools for statistical modeling and inference when
a significant portion of the outcome data is predicted by AI/ML algorithms.
It implements several state-of-the-art methods for inference on predicted
data (IPD), offering a user-friendly interface to facilitate their use in
real-world applications.
This package is particularly useful in scenarios where predicted values
(e.g., from machine learning models) are used as proxies for unobserved
outcomes, which can introduce biases in estimation and inference. The ipd
package integrates methods designed to address these challenges.
Multiple IPD methods: PostPI
, PPI
, PPI++
, and PSPA
currently.
Flexible wrapper functions for ease of use.
Custom methods for model inspection and evaluation.
Seamless integration with common data structures in R.
Comprehensive documentation and examples.
ipd
: Main wrapper function which implements various methods for inference on predicted data for a specified model/outcome type (e.g., mean estimation, linear regression).
simdat
: Simulates data for demonstrating the use of the various IPD methods.
print.ipd
: Prints a brief summary of the IPD method/model combination.
summary.ipd
: Summarizes the results of fitted IPD models.
tidy.ipd
: Tidies the IPD method/model fit into a data frame.
glance.ipd
: Glances at the IPD method/model fit, returning a one-row summary.
augment.ipd
: Augments the data used for an IPD method/model fit with additional information about each observation.
The package includes detailed documentation for each function, including usage examples. A vignette is also provided to guide users through common workflows and applications of the package.
For details on the statistical methods implemented in this package, please refer to the associated manuscripts at the following references:
PostPI: Wang, S., McCormick, T. H., & Leek, J. T. (2020). Methods for correcting inference based on outcomes predicted by machine learning. Proceedings of the National Academy of Sciences, 117(48), 30266-30275.
PPI: Angelopoulos, A. N., Bates, S., Fannjiang, C., Jordan, M. I., & Zrnic, T. (2023). Prediction-powered inference. Science, 382(6671), 669-674.
PPI++: Angelopoulos, A. N., Duchi, J. C., & Zrnic, T. (2023). PPI++: Efficient prediction-powered inference. arXiv preprint arXiv:2311.01453.
PSPA: Miao, J., Miao, X., Wu, Y., Zhao, J., & Lu, Q. (2023). Assumption-lean and data-adaptive post-prediction inference. arXiv preprint arXiv:2311.14220.
Maintainer: Stephen Salerno ssalerno@fredhutch.org (ORCID) [copyright holder]
Authors:
Jiacheng Miao jmiao24@wisc.edu
Awan Afiaz aafiaz@uw.edu
Kentaro Hoffman khoffm3@uw.edu
Anna Neufeld acn2@williams.edu
Qiongshi Lu qlu@biostat.wisc.edu
Tyler H McCormick tylermc@uw.edu
Jeffrey T Leek jtleek@fredhutch.org
Useful links:
Report bugs at https://github.com/ipd-tools/ipd/issues
#-- Generate Example Data
set.seed(12345)
dat <- simdat(n = c(300, 300, 300), effect = 1, sigma_Y = 1)
head(dat)
formula <- Y - f ~ X1
#-- PostPI Analytic Correction (Wang et al., 2020)
fit_postpi1 <- ipd(formula, method = "postpi_analytic", model = "ols",
data = dat, label = "set_label")
#-- PostPI Bootstrap Correction (Wang et al., 2020)
nboot <- 200
fit_postpi2 <- ipd(formula, method = "postpi_boot", model = "ols",
data = dat, label = "set_label", nboot = nboot)
#-- PPI (Angelopoulos et al., 2023)
fit_ppi <- ipd(formula, method = "ppi", model = "ols",
data = dat, label = "set_label")
#-- PPI++ (Angelopoulos et al., 2023)
fit_plusplus <- ipd(formula, method = "ppi_plusplus", model = "ols",
data = dat, label = "set_label")
#-- PSPA (Miao et al., 2023)
fit_pspa <- ipd(formula, method = "pspa", model = "ols",
data = dat, label = "set_label")
#-- Print the Model
print(fit_postpi1)
#-- Summarize the Model
summ_fit_postpi1 <- summary(fit_postpi1)
#-- Print the Model Summary
print(summ_fit_postpi1)
#-- Tidy the Model Output
tidy(fit_postpi1)
#-- Get a One-Row Summary of the Model
glance(fit_postpi1)
#-- Augment the Original Data with Fitted Values and Residuals
augmented_df <- augment(fit_postpi1)
head(augmented_df)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.