View source: R/weights-stacking.R
stackingWeights | R Documentation |
Compute model weights based on a cross-validation-like procedure.
stackingWeights(object, ..., data, R, p = 0.5)
object , ... |
two or more fitted |
data |
a data frame containing the variables in the model, used for fitting and prediction. |
R |
the number of replicates. |
p |
the proportion of the |
Each model in a set is fitted to the training data: a subset of p * N
observations in data
. From these models a prediction is produced on
the remaining part of data
(the test
or hold-out data). These hold-out predictions are fitted to the hold-out
observations, by optimising the weights by which the models are combined. This
process is repeated R
times, yielding a distribution of weights for each
model (which Smyth & Wolpert (1998) referred to as an ‘empirical Bayesian
estimate of posterior model probability’). A mean or median of model weights for
each model is taken and re-scaled to sum to one.
A matrix with two rows, containing model weights
calculated using mean
and median
.
This approach requires a sample size of at least 2\times
the number
of models.
Carsten Dormann, Kamil Bartoń
Wolpert, D. H. 1992 Stacked generalization. Neural Networks 5, 241–259.
Smyth, P. and Wolpert, D. 1998 An Evaluation of Linearly Combining Density Estimators via Stacking. Technical Report No. 98–25. Information and Computer Science Department, University of California, Irvine, CA.
Dormann, C. et al. 2018 Model averaging in ecology: a review of Bayesian, information-theoretic, and tactical approaches for predictive inference. Ecological Monographs 88, 485–504.
Weights
, model.avg
Other model weights:
BGWeights()
,
bootWeights()
,
cos2Weights()
,
jackknifeWeights()
#simulated Cement dataset to increase sample size for the training data
fm0 <- glm(y ~ X1 + X2 + X3 + X4, data = Cement, na.action = na.fail)
dat <- as.data.frame(apply(Cement[, -1], 2, sample, 50, replace = TRUE))
dat$y <- rnorm(nrow(dat), predict(fm0), sigma(fm0))
# global model fitted to training data:
fm <- glm(y ~ X1 + X2 + X3 + X4, data = dat, na.action = na.fail)
# generate a list of *some* subsets of the global model
models <- lapply(dredge(fm, evaluate = FALSE, fixed = "X1", m.lim = c(1, 3)), eval)
wts <- stackingWeights(models, data = dat, R = 10)
ma <- model.avg(models)
Weights(ma) <- wts["mean", ]
predict(ma)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.