stackingWeights: Stacking model weights

View source: R/weights-stacking.R

stackingWeightsR Documentation

Stacking model weights

Description

Computes model weights based on a cross-validation-like procedure.

Usage

stackingWeights(object, ..., data, R, p = 0.5)

Arguments

object, ...

two or more fitted glm objects, or a list of such, or an "averaging" object.

data

a data frame containing the variables in the model, used for fitting and prediction.

R

the number of replicates.

p

the proportion of the data to be used as training set. Defaults to 0.5.

Details

Each model in a set is fitted to the training data: a subset of p * N observations in data. From these models a prediction is produced on the remaining part of data (the test or hold-out data). These hold-out predictions are fitted to the hold-out observations, by optimising the weights by which the models are combined. This process is repeated R times, yielding a distribution of weights for each model (which Smyth & Wolpert (1998) referred to as an ‘empirical Bayesian estimate of posterior model probability’). A mean or median of model weights for each model is taken and re-scaled to sum to one.

Value

A matrix with two rows, containing model weights calculated using mean and median.

Note

This approach requires a sample size of at least 2x the number of models.

Author(s)

Carsten Dormann, Kamil Bartoń

References

Wolpert, D. H. (1992) Stacked generalization. Neural Networks, 5: 241-259.

Smyth, P. & Wolpert, D. (1998) An Evaluation of Linearly Combining Density Estimators via Stacking. Technical Report No. 98-25. Information and Computer Science Department, University of California, Irvine, CA.

Dormann, C. et al. (2018) Model averaging in ecology: a review of Bayesian, information-theoretic, and tactical approaches for predictive inference. Ecological Monographs, 88, 485–504.

See Also

Weights, model.avg

Other model weights: BGWeights(), bootWeights(), cos2Weights(), jackknifeWeights()

Examples

#simulated Cement dataset to increase sample size for the training data 
fm0 <- glm(y ~ X1 + X2 + X3 + X4, data = Cement, na.action = na.fail)
dat <- as.data.frame(apply(Cement[, -1], 2, sample, 50, replace = TRUE))
dat$y <- rnorm(nrow(dat), predict(fm0), sigma(fm0))

# global model fitted to training data:
fm <- glm(y ~ X1 + X2 + X3 + X4, data = dat, na.action = na.fail)

# generate a list of *some* subsets of the global model
models <- lapply(dredge(fm, evaluate = FALSE, fixed = "X1", m.lim = c(1, 3)), eval)

wts <- stackingWeights(models, data = dat, R = 10)

ma <- model.avg(models)
Weights(ma) <- wts["mean", ]

predict(ma)


MuMIn documentation built on March 18, 2022, 5:28 p.m.