The goal of mipred is to calibrate a prediction rule using generalized linear models or Cox regression modeling, using multiple imputation to account for missing values in the predictors as described by Mertens, Banzato and de Wreede (2018) (https://arxiv.org/abs/1810.05099). Imputations are generated using the R package ‘mice’ without using outcomes on observations for which the prediction is generated. Two options are provided to generate predictions. The first is prediction-averaging of predictions calibrated from single models fitted on single imputed datasets within a set of multiple imputations. The second is application of the Rubin’s rules pooled model. For both implementations, unobserved values in the predictor data of new observations for which the predictions are derived are automatically imputed. The package contains two basic workhorse functions, the first of which is mipred() which generates predictions of outcome on new observations (when outcomes will by definition usually not be available at the time of calibration of the prediction rule). The second is the function mipred.cv() which generates cross-validated predictions with the methodology on existing data for which outcomes have already been observed. This allows users to assess predictive potential of the prediction models which are calibrated. The present version of the package is preliminary (development) and has only been thoroughly checked for application on binary-outcome logistic regression for now. The vignette which is included documents application of the functions for binary outcome data. Although we did not check extensively, the package should also work for continuous and counting outcomes. We are working to expand the functionality to censored survival outcomes.
You can install the released version of mipred from CRAN.
Alternatively, you can install the current version into R from GitHub using devtools:
For installation from Github, you may need to install and load the devtools package first before using the above command. See the book “R packages” (online version) by Hadley Wickham, chapter “Git and Github”.
There are currently two key functions
mipred() # prediction calibration with multiple imputation for missing predictors mipred.cv() # cross-validation for prediction calibration with multiple imputation for missing predictors
The first function calibrates predictions for new observations and accounts for missing values in the predictor data (of either the calibration or new validation sample) through multiple imputation. The second function implements cross-validation of the same approach.
dataset be a data.frame consisting of a vector of binary outcomes
outcome and two predictors
x2. The outcome must be fully
observed. Likewise, let
newdataset be a data.frame with new
observations for which the same predictors
x2 are observed
and for which we want to predict outcome, using a model fitted to the
old data in
dataset. Either or both of these predictors may contain
missing values in the calibration data, but this is also allowed in the
new data for which we want to generate predictions.
We can generate predictions using the command
preds <- mipred(outcome ~ x1 + x2, family=binomial, data=dataset, newdata=newdataset, nimp=100)
This will use the logistic regression model and 100 imputations.
If we wanted to generate cross-validated predictions within the set
dataset, then we can generate these with the same model
This will generate cross-validated predictions from the same model and 100 imputations for each predicted observation, using 10-folds.
Please refer to the example included with the package. The package also includes a vignette which documents use for binary outcome data.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.