kfold.stanreg | R Documentation |
The kfold
method performs exact K
-fold cross-validation. First
the data are randomly partitioned into K
subsets of equal size (or as close
to equal as possible), or the user can specify the folds
argument
to determine the partitioning. Then the model is refit K
times, each time
leaving out one of the K
subsets. If K
is equal to the total
number of observations in the data then K
-fold cross-validation is
equivalent to exact leave-one-out cross-validation (to which
loo
is an efficient approximation).
## S3 method for class 'stanreg'
kfold(
x,
K = 10,
...,
folds = NULL,
save_fits = FALSE,
cores = getOption("mc.cores", 1)
)
x |
A fitted model object returned by one of the rstanarm modeling functions. See stanreg-objects. |
K |
For |
... |
Currently ignored. |
folds |
For |
save_fits |
For |
cores |
The number of cores to use for parallelization. Instead fitting
separate Markov chains for the same model on different cores, by default
|
An object with classes 'kfold' and 'loo' that has a similar structure
as the objects returned by the loo
and waic
methods and is compatible with the loo_compare
function for
comparing models.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4. arXiv preprint: https://arxiv.org/abs/1507.04544
Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Using stacking to average Bayesian predictive distributions. Bayesian Analysis, advance publication, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/17-BA1091")}.
if (.Platform$OS.type != "windows" || .Platform$r_arch != "i386") {
fit1 <- stan_glm(mpg ~ wt, data = mtcars, refresh = 0)
fit2 <- stan_glm(mpg ~ wt + cyl, data = mtcars, refresh = 0)
fit3 <- stan_glm(mpg ~ disp * as.factor(cyl), data = mtcars, refresh = 0)
# 10-fold cross-validation
# (if possible also specify the 'cores' argument to use multiple cores)
(kfold1 <- kfold(fit1, K = 10))
kfold2 <- kfold(fit2, K = 10)
kfold3 <- kfold(fit3, K = 10)
loo_compare(kfold1, kfold2, kfold3)
# stratifying by a grouping variable
# (note: might get some divergences warnings with this model but
# this is just intended as a quick example of how to code this)
fit4 <- stan_lmer(mpg ~ disp + (1|cyl), data = mtcars, refresh = 0)
table(mtcars$cyl)
folds_cyl <- loo::kfold_split_stratified(K = 3, x = mtcars$cyl)
table(cyl = mtcars$cyl, fold = folds_cyl)
kfold4 <- kfold(fit4, folds = folds_cyl, cores = 2)
print(kfold4)
}
# Example code demonstrating the different ways to specify the number
# of cores and how the cores are used
#
# options(mc.cores = NULL)
#
# # spread the K models over N_CORES cores (method 1)
# kfold(fit, K, cores = N_CORES)
#
# # spread the K models over N_CORES cores (method 2)
# options(mc.cores = N_CORES)
# kfold(fit, K)
#
# # fit K models sequentially using N_CORES cores for the Markov chains each time
# options(mc.cores = N_CORES)
# kfold(fit, K, cores = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.