kfold | R Documentation |
Runs k-fold or Leave One Out Cross Validation for a specified component of a JAGS data object, for a specified JAGS model.
JAGS is run internally k
times (or alternately, the size of the dataset),
withholding each of k
"folds" of the input data and drawing posterior predictive
samples corresponding to the withheld data, which can then be compared to the
input data to assess model predictive power.
Global measures of predictive power are provided in output: Root Mean Square (Prediction) Error and Mean Absolute (Prediction) Error. However, it is likely that these measures will not be meaningful by themselves; rather, as a metric for scoring a set of candidate models.
kfold(
model.file,
data,
p,
addl_p = NULL,
save_postpred = FALSE,
k = 10,
loocv = FALSE,
fold_dims = NULL,
...
)
model.file |
Path to file containing the model written in BUGS code, passed directly to jags. |
data |
The named list of data objects, passed directly to jags. |
p |
The name of the data object to use for K-fold or LOO CV. |
addl_p |
Names of additional parameters to save from JAGS output,
if a metric such as Log Pointwise Predictive Density is to be calculated from
cross-validation results. Defaults to |
save_postpred |
Whether to save all posterior predictive samples,
in addition to posterior medians. Defaults to |
k |
How many folds to use for cross-validation. Defaults to |
loocv |
Whether to perform Leave One Out (rather than k-fold) Cross
Validation. Setting this to |
fold_dims |
A vector of margins to use for selecting folds, if the data
object used for cross validation is a matrix or array. For example, if the
data consists of a two-dimensional matrix, setting |
... |
additional arguments to jags. These may (or must)
include |
A named list, which may consist of the following:
$pred_y
: Point estimates of predicted values corresponding to each data
element, calculated as the posterior predictive median value
$data_y
: Original data used for cross validation
$postpred_y
: All posterior predictive samples corresponding to each data
element, if save_postpred=TRUE
$rmse_pred
: Root Mean Square (Prediction) Error
$mae_pred
: Mean Absolute (Prediction) Error
$addl_p
: A list with length equal to k
(or the number of folds), with
each list element containing all posterior samples for additional parameters,
if these are supplied in argument addl_p=
.
$fold
: A vector, matrix, or array corresponding to the original data,
giving the numerical values of the corresponding fold used
Matt Tyers
qq_postpred, plot_postpred, plotRhats, traceworstRhat
#### test case where y is a matrix
asdf_jags <- tempfile()
cat('model {
for(i in 1:n) {
for(j in 1:ngrp) {
y[i,j] ~ dnorm(mu[i,j], tau)
mu[i,j] <- b0 + b1*x[i,j] + a[j]
}
}
for(j in 1:ngrp) {
a[j] ~ dnorm(0, tau_a)
}
tau <- pow(sig, -2)
sig ~ dunif(0, 10)
b0 ~ dnorm(0, 0.001)
b1 ~ dnorm(0, 0.001)
tau_a <- pow(sig_a, -2)
sig_a ~ dunif(0, 10)
}', file=asdf_jags)
# simulate data to go with the example model
n <- 45
x <- matrix(rnorm(n, sd=3),
nrow=20, ncol=3)
y <- matrix(rnorm(n, mean=rep(1:3, each=20)-x),
nrow=20, ncol=3)
asdf_data <- list(x=x,
y=y,
n=nrow(x),
ngrp=ncol(x))
# JAGS controls
niter <- 1000
ncores <- 2
# ncores <- min(10, parallel::detectCores()-1)
## random assignment of folds
kfold1 <- kfold(p="y",
k=5,
model.file=asdf_jags, data=asdf_data,
n.chains=ncores, n.iter=niter,
n.burnin=niter/2, n.thin=niter/1000,
parallel=FALSE)
str(kfold1)
kfold1$fold
## Performing LOOCV, but assigning folds by row of input data
kfold2 <- kfold(p="y",
loocv=TRUE, fold_dims=1,
model.file=asdf_jags, data=asdf_data,
n.chains=ncores, n.iter=niter,
n.burnin=niter/2, n.thin=niter/1000,
parallel=FALSE)
str(kfold2)
kfold2$fold
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.