mipred.cv: Cross-validation prediction using multiple imputation

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/mipred.cv.R

Description

Calculates cross-validated predictions based on within-sample assessment and calibration using generalized linear models with multiple imputations to account for missing values in predictor data.

Usage

1
2
mipred.cv(formula, family, data, nimp, folds = NULL,
  method = "averaging", mice.options = NULL)

Arguments

formula

A formula object providing a symbolic description of the prediction model to be fitted.

family

Specification of an appropriate error distribution and link function.

data

A data.frame containing calibration data on n samples. Variables declared in formula must be found in data.

nimp

Number of imputations used in the prediction of each observation.

folds

Number of fold-partitions defined within data used in cross-validation. An integer from 2 to n. Defaults to NULL which internally sets folds=n, which puts each observation in data into its own singleton fold for leave-one-out cross-validation.

method

Imputation combination method. This defaults to "averaging" for the prediction-averaging approach. The alternative "rubin" applies the Rubin's rules pooled model.

mice.options

Optional list containing arguments to be supplied to mice. Refer to the mice documentation for details. The following options may be specified: method, predictorMatrix, blocks, visitSequence, formulas, blots, post, defaultMethod, maxit, printFlag, seed, data.init. Please refer to the mice documentation for the description of these options. To set the number of imputations nimp should be used. seed may be specified as a numeric vector of length nimp*folds when method is set to averaging and of length folds when method is set to rubin. Setting seed to a vector will cause each next call to mice to use the next seed value in the vector. Setting the seed to a single numeric value will cause all instances of mice to use that same seed value. If you specify a seed vector of insufficient length then the values will be recycled. The required length is folds*nimp for the averaging approach and length folds for the rubin approach. The defaultMethod is set to c("pmm", "logreg", "polyreg", "polr") by default. The default setting for printFlag is FALSE. The default for maxit is 50. All other options are set to NULL by default.

Value

A list containing predictions.

pred

Matrix of predictions on the scale of the response variable of dimension n by nimp.

linpred

Matrix of predictions on the scale of the linear predictor of dimension n by nimp.

Author(s)

Bart J A Mertens, b.mertens@lumc.nl

References

https://arxiv.org/abs/1810.05099

See Also

mice

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Generate a copy of the cll data and construct binary outcome from survival information
cll_bin<-cll
cll_bin$srv5y_s[cll_bin$srv5y>12] <- 0  # Apply administrative censorship at t=12 months
cll_bin$srv5y[cll_bin$srv5y>12]  <- 12
cll_bin$Status[cll_bin$srv5y_s==1]<- 1  # Define the new binary "Status" outcome variable
cll_bin$Status[cll_bin$srv5y_s==0] <- 0  # As numeric -> 1:Dead, 0:Alive
cll_bin$Censor <- NULL # Remove survival outcomes
cll_bin$srv5y <- NULL
cll_bin$srv5y_s <- NULL

# Cross-validate prediction using logistic regression in the first 100 samples
# Apply prediction-averaging using 5 imputations, 5 folds and maxit=5.
# Note these settings are only for illustration and should be set to higher values for
# practical use, particularly for nimp.
output<-mipred.cv(Status ~ age10+cyto, family=binomial, data=cll_bin[1:100,-1],
nimp=5, folds=5, mice.options=list(maxit=5))

BartJAMertens/mipred documentation built on Sept. 4, 2019, 5:32 p.m.