mipred.cv: Cross-validation prediction using multiple imputation
In BartJAMertens/mipred: Prediction using Multiple Imputation

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/mipred.cv.R

Calculates cross-validated predictions based on within-sample assessment and calibration using generalized linear models with multiple imputations to account for missing values in predictor data.

1 2	mipred.cv(formula, family, data, nimp, folds = NULL, method = "averaging", mice.options = NULL)

`formula`	A formula object providing a symbolic description of the prediction model to be fitted.
`family`	Specification of an appropriate error distribution and link function.
`data`	A data.frame containing calibration data on `n` samples. Variables declared in `formula` must be found in `data`.
`nimp`	Number of imputations used in the prediction of each observation.
`folds`	Number of fold-partitions defined within `data` used in cross-validation. An integer from 2 to `n`. Defaults to NULL which internally sets `folds=n`, which puts each observation in `data` into its own singleton fold for leave-one-out cross-validation.
`method`	Imputation combination method. This defaults to `"averaging"` for the prediction-averaging approach. The alternative `"rubin"` applies the Rubin's rules pooled model.
`mice.options`	Optional list containing arguments to be supplied to `mice`. Refer to the `mice` documentation for details. The following options may be specified: `method`, `predictorMatrix`, `blocks`, `visitSequence`, `formulas`, `blots`, `post`, `defaultMethod`, `maxit`, `printFlag`, `seed`, `data.init`. Please refer to the `mice` documentation for the description of these options. To set the number of imputations `nimp` should be used. `seed` may be specified as a numeric vector of length `nimpfolds` when `method` is set to `averaging` and of length `folds` when `method` is set to `rubin`. Setting `seed` to a vector will cause each next call to `mice` to use the next seed value in the vector. Setting the seed to a single numeric value will cause all instances of mice to use that same seed value. If you specify a seed vector of insufficient length then the values will be recycled. The required length is `foldsnimp` for the averaging approach and length `folds` for the rubin approach. The `defaultMethod` is set to `c("pmm", "logreg", "polyreg", "polr")` by default. The default setting for `printFlag` is FALSE. The default for `maxit` is 50. All other options are set to `NULL` by default.

A list containing predictions.

pred: Matrix of predictions on the scale of the response variable of dimension n by nimp.
linpred: Matrix of predictions on the scale of the linear predictor of dimension n by nimp.

Bart J A Mertens, b.mertens@lumc.nl

https://arxiv.org/abs/1810.05099

mice

# Generate a copy of the cll data and construct binary outcome from survival information
cll_bin<-cll
cll_bin$srv5y_s[cll_bin$srv5y>12] <- 0  # Apply administrative censorship at t=12 months
cll_bin$srv5y[cll_bin$srv5y>12]  <- 12
cll_bin$Status[cll_bin$srv5y_s==1]<- 1  # Define the new binary "Status" outcome variable
cll_bin$Status[cll_bin$srv5y_s==0] <- 0  # As numeric -> 1:Dead, 0:Alive
cll_bin$Censor <- NULL # Remove survival outcomes
cll_bin$srv5y <- NULL
cll_bin$srv5y_s <- NULL

# Cross-validate prediction using logistic regression in the first 100 samples
# Apply prediction-averaging using 5 imputations, 5 folds and maxit=5.
# Note these settings are only for illustration and should be set to higher values for
# practical use, particularly for nimp.
output<-mipred.cv(Status ~ age10+cyto, family=binomial, data=cll_bin[1:100,-1],
nimp=5, folds=5, mice.options=list(maxit=5))