metapred: Generalized Stepwise Regression for Prediction Models in...

View source: R/metapred.R

metapredR Documentation

Generalized Stepwise Regression for Prediction Models in Clustered Data

Description

Generalized stepwise regression for obtaining a prediction model that is validated with (stepwise) internal-external cross-validation, in or to obtain adequate performance across data sets. Requires data from individuals in multiple studies.

Usage

metapred(
  data,
  strata,
  formula,
  estFUN = "glm",
  scope = NULL,
  retest = FALSE,
  max.steps = 1000,
  center = FALSE,
  recal.int = FALSE,
  cvFUN = NULL,
  cv.k = NULL,
  metaFUN = NULL,
  meta.method = NULL,
  predFUN = NULL,
  perfFUN = NULL,
  genFUN = NULL,
  selFUN = "which.min",
  ...
)

Arguments

data

data.frame containing the data. Note that metapred removes observations with missing data listwise for all variables in formula and scope, to ensure that the same data is used in each model in each step. The outcome variable should be numeric or coercible to numeric by as.numeric().

strata

Character to specify the name of the strata (e.g. studies or clusters) variable

formula

formula of the first model to be evaluated. metapred will start at formula and update it using terms of scope. Defaults to full main effects model, where the first column in data is assumed to be the outcome and all remaining columns (except strata) predictors. See formula for formulas in general.

estFUN

Function for estimating the model in the first stage. Currently "lm", "glm" and "logistfirth" are supported.

scope

formula. The difference between formula and scope defines the range of models examined in the stepwise search. Defaults to NULL, which leads to the intercept-only model. If scope is not nested in formula, this implies backwards selection will be applied (default). If scope is nested in formula, this implies forward selection will be applied. If equal, no stepwise selection is applied.

retest

Logical. Should added (removed) terms be retested for removal (addition)? TRUE implies bi-directional stepwise search.

max.steps

Integer. Maximum number of steps (additions or removals of terms) to take. Defaults to 1000, which is essentially as many as it takes. 0 implies no stepwise selection.

center

logical. Should numeric predictors be centered around the cluster mean?

recal.int

Logical. Should the intercept be recalibrated in each validation?

cvFUN

Cross-validation method, on the study (i.e. cluster or stratum) level. "l1o" for leave-one-out cross-validation (default). "bootstrap" for bootstrap. Or "fixed", for one or more data sets which are only used for validation. A user written function may be supplied as well.

cv.k

Parameter for cvFUN. For cvFUN="bootstrap", this is the number of bootstraps. For cvFUN="fixed", this is a vector of the indices of the (sorted) data sets. Not used for cvFUN="l1o".

metaFUN

Function for computing the meta-analytic coefficient estimates in two-stage MA. By default, rma.uni, from the metafor package is used. Default settings are univariate random effects, estimated with "DL". Method can be passed trough the meta.method argument.

meta.method

Name of method for meta-analysis. Default is "DL". For more options see rma.uni.

predFUN

Function for predicting new values. Defaults to the predicted probability of the outcome, using the link function of glm() or lm().

perfFUN

Function for computing the performance of the prediction models. Default: mean squared error (perfFUN="mse").Other options are "var.e" (variance of prediction error), "auc" (area under the curve), "cal.int" (calibration intercept), and "cal.slope" (multiplicative calibration slope) and "cal.add.slope" (additive calibration slope).

genFUN

Function or list of named functions for computing generalizability of the performance. Default: (absolute) mean (genFUN="abs.mean"). Choose coef.var for the coefficient of variation. If a list, only the first is used for model selection.

selFUN

Function for selecting the best method. Default: lowest value for genFUN. Should be set to "which.max" if high values for genFUN indicate a good model.

...

To pass arguments to estFUN (e.g. family = "binomial"), or to other FUNctions.

Details

Use subset.metapred to obtain an individual prediction model from a metapred object.

Note that formula.changes is currently unordered; it does not represent the order of changes in the stepwise procedure.

metapred is still under development, use with care.

Value

A list of class metapred, containing the final model in global.model, and the stepwise tree of estimates of the coefficients, performance measures, generalizability measures in stepwise.

Author(s)

Valentijn de Jong <Valentijn.M.T.de.Jong@gmail.com>

References

Debray TPA, Moons KGM, Ahmed I, Koffijberg H, Riley RD. A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis. Stat Med. 2013;32(18):3158-80.

de Jong VMT, Moons KGM, Eijkemans MJC, Riley RD, Debray TPA. Developing more generalizable prediction models from pooled studies and large clustered data sets. Stat Med. 2021;40(15):3533–59.

Riley RD, Tierney JF, Stewart LA. Individual participant data meta-analysis: a handbook for healthcare research. Hoboken, NJ: Wiley; 2021. ISBN: 978-1-119-33372-2.

Schmid CH, Stijnen T, White IR. Handbook of meta-analysis. First edition. Boca Raton: Taylor and Francis; 2020. ISBN: 978-1-315-11940-3.

See Also

forest.metapred for generating a forest plot of prediction model performance

Examples

data(DVTipd)

## Not run: 
# Explore heterogeneity in intercept and assocation of 'ddimdich'
glmer(dvt ~ 0 + cluster + (ddimdich|study), family = binomial(), data = DVTipd)

## End(Not run)

# Scope
f <- dvt ~ histdvt + ddimdich + sex + notraum

# Internal-external cross-validation of a pre-specified model 'f'
fit <- metapred(DVTipd, strata = "study", formula = f, scope = f, family = binomial)
fit

# Let's try to simplify model 'f' in order to improve its external validity
metapred(DVTipd, strata = "study", formula = f, family = binomial)

# We can also try to build a generalizable model from scratch

## Not run: 
# Some additional examples:
metapred(DVTipd, strata = "study", formula = dvt ~ 1, scope = f, family = binomial) # Forwards
metapred(DVTipd, strata = "study", formula = f, scope = f, family = binomial) # no selection
metapred(DVTipd, strata = "study", formula = f, max.steps = 0, family = binomial) # no selection
metapred(DVTipd, strata = "study", formula = f, recal.int = TRUE, family = binomial)
metapred(DVTipd, strata = "study", formula = f, meta.method = "REML", family = binomial)

## End(Not run)
# By default, metapred assumes the first column is the outcome.
newdat <- data.frame(dvt=0, histdvt=0, ddimdich=0, sex=1, notraum=0)
fitted <- predict(fit, newdata = newdat)


metamisc documentation built on Sept. 25, 2022, 5:05 p.m.