| ipd-package | R Documentation |
Performs valid statistical inference on predicted data (IPD) using recent methods, where for a subset of the data, the outcomes have been predicted by an algorithm. Provides a wrapper function with specified defaults for the type of model and method to be used for estimation and inference. Further provides methods for tidying and summarizing results. Salerno et al., (2025) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/bioinformatics/btaf055")}.
The ipd package provides tools for statistical modeling and inference when
a significant portion of the outcome data is predicted by AI/ML algorithms.
It implements several state-of-the-art methods for inference on predicted
data (IPD), offering a user-friendly interface to facilitate their use in
real-world applications.
This package is particularly useful in scenarios where predicted values
(e.g., from machine learning models) are used as proxies for unobserved
outcomes, which can introduce biases in estimation and inference. The ipd
package integrates methods designed to address these challenges.
Multiple IPD methods: Chen and Chen, PDC, PostPI, PPI, PPI++, and PSPA currently.
Flexible wrapper functions for ease of use.
Custom methods for model inspection and evaluation.
Seamless integration with common data structures in R.
Comprehensive documentation and examples.
ipd: Main wrapper function which implements various methods
for inference on predicted data for a specified model/outcome type
(e.g., mean estimation, linear regression).
simdat: Simulates data for demonstrating the use of the
various IPD methods.
print.ipd: Prints a brief summary of the IPD method/model
combination.
summary.ipd: Summarizes the results of fitted IPD models.
tidy.ipd: Tidies the IPD method/model fit into a data frame.
glance.ipd: Glances at the IPD method/model fit, returning a
one-row summary.
augment.ipd: Augments the data used for an IPD method/model
fit with additional information about each observation.
The package includes detailed documentation for each function, including usage examples. A vignette is also provided to guide users through common workflows and applications of the package.
For details on the statistical methods implemented in this package, please refer to the vignette.
Maintainer: Stephen Salerno ssalerno@fredhutch.org (ORCID) [copyright holder]
Authors:
Jiacheng Miao jmiao24@wisc.edu
Awan Afiaz aafiaz@uw.edu
Kentaro Hoffman khoffm3@uw.edu
Jesse Gronsbell j.gronsbell@utoronto.ca
Jianhui Gao jianhui.gao@mail.utoronto.ca
David Cheng dcheng@mgh.harvard.edu
Anna Neufeld acn2@williams.edu
Qiongshi Lu qlu@biostat.wisc.edu
Tyler H McCormick tylermc@uw.edu
Jeffrey T Leek jtleek@fredhutch.org
Useful links:
Report bugs at https://github.com/ipd-tools/ipd/issues
#-- Generate Example Data
set.seed(12345)
dat <- simdat(n = c(300, 300, 300), effect = 1, sigma_Y = 1)
head(dat)
formula <- Y - f ~ X1
#-- PostPI Analytic Correction (Wang et al., 2020)
fit_postpi1 <- ipd(formula,
method = "postpi_analytic", model = "ols",
data = dat, label = "set_label"
)
#-- PostPI Bootstrap Correction (Wang et al., 2020)
nboot <- 200
fit_postpi2 <- ipd(formula,
method = "postpi_boot", model = "ols",
data = dat, label = "set_label", nboot = nboot
)
#-- PPI (Angelopoulos et al., 2023)
fit_ppi <- ipd(formula,
method = "ppi", model = "ols",
data = dat, label = "set_label"
)
#-- PPI++ (Angelopoulos et al., 2023)
fit_plusplus <- ipd(formula,
method = "ppi_plusplus", model = "ols",
data = dat, label = "set_label"
)
#-- PSPA (Miao et al., 2023)
fit_pspa <- ipd(formula,
method = "pspa", model = "ols",
data = dat, label = "set_label"
)
#-- Chen and Chen (Gronsbell et al., 2026)
fit_chen <- ipd(formula,
method = "chen", model = "ols",
data = dat, label = "set_label"
)
#-- Prediction Decorrelated Inference (Gan et al., 2024)
fit_chen <- ipd(formula,
method = "pdc", model = "ols",
data = dat, label = "set_label"
)
#-- Print the Model
print(fit_postpi1)
#-- Summarize the Model
summ_fit_postpi1 <- summary(fit_postpi1)
#-- Print the Model Summary
print(summ_fit_postpi1)
#-- Tidy the Model Output
tidy(fit_postpi1)
#-- Get a One-Row Summary of the Model
glance(fit_postpi1)
#-- Augment the Original Data with Fitted Values and Residuals
augmented_df <- augment(fit_postpi1)
head(augmented_df)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.