kfold: Automated K-fold or Leave One Out Cross Validation

View source: R/kfold.R

kfoldR Documentation

Automated K-fold or Leave One Out Cross Validation

Description

Runs k-fold or Leave One Out Cross Validation for a specified component of a JAGS data object, for a specified JAGS model.

JAGS is run internally k times (or alternately, the size of the dataset), withholding each of k "folds" of the input data and drawing posterior predictive samples corresponding to the withheld data, which can then be compared to the input data to assess model predictive power.

Global measures of predictive power are provided in output: Root Mean Square (Prediction) Error and Mean Absolute (Prediction) Error. However, it is likely that these measures will not be meaningful by themselves; rather, as a metric for scoring a set of candidate models.

Usage

kfold(
  model.file,
  data,
  p,
  addl_p = NULL,
  save_postpred = FALSE,
  k = 10,
  loocv = FALSE,
  fold_dims = NULL,
  ...
)

Arguments

model.file

Path to file containing the model written in BUGS code, passed directly to jags.

data

The named list of data objects, passed directly to jags.

p

The name of the data object to use for K-fold or LOO CV.

addl_p

Names of additional parameters to save from JAGS output, if a metric such as Log Pointwise Predictive Density is to be calculated from cross-validation results. Defaults to NULL, indicating no additional parameters.

save_postpred

Whether to save all posterior predictive samples, in addition to posterior medians. Defaults to FALSE.

k

How many folds to use for cross-validation. Defaults to 10. If this is set to a number equal to (or greater than) the sample size, LOOCV behavior will result.

loocv

Whether to perform Leave One Out (rather than k-fold) Cross Validation. Setting this to TRUE will override the input to ⁠k=⁠. Defaults to FALSE.

fold_dims

A vector of margins to use for selecting folds, if the data object used for cross validation is a matrix or array. For example, if the data consists of a two-dimensional matrix, setting fold_dims=1 will result in whole rows being selected in each fold, or setting fold_dims=2 will result in whole columns. However, this is generalized to accept vectors of multiple fold_dims and higher-dimensional arrays of data.

...

additional arguments to jags. These may (or must) include n.chains, n.iter, n.burnin, n.thin, parallel, etc.

Value

A named list, which may consist of the following:

  • ⁠$pred_y⁠: Point estimates of predicted values corresponding to each data element, calculated as the posterior predictive median value

  • ⁠$data_y⁠: Original data used for cross validation

  • ⁠$postpred_y⁠: All posterior predictive samples corresponding to each data element, if save_postpred=TRUE

  • ⁠$rmse_pred⁠: Root Mean Square (Prediction) Error

  • ⁠$mae_pred⁠: Mean Absolute (Prediction) Error

  • ⁠$addl_p⁠: A list with length equal to k (or the number of folds), with each list element containing all posterior samples for additional parameters, if these are supplied in argument ⁠addl_p=⁠.

  • ⁠$fold⁠: A vector, matrix, or array corresponding to the original data, giving the numerical values of the corresponding fold used

Author(s)

Matt Tyers

See Also

qq_postpred, plot_postpred, plotRhats, traceworstRhat

Examples

#### test case where y is a matrix
asdf_jags <- tempfile()
cat('model {
  for(i in 1:n) {
    for(j in 1:ngrp) {
      y[i,j] ~ dnorm(mu[i,j], tau)
      mu[i,j] <- b0 + b1*x[i,j] + a[j]
    }
  }

  for(j in 1:ngrp) {
    a[j] ~ dnorm(0, tau_a)
  }

  tau <- pow(sig, -2)
  sig ~ dunif(0, 10)
  b0 ~ dnorm(0, 0.001)
  b1 ~ dnorm(0, 0.001)

  tau_a <- pow(sig_a, -2)
  sig_a ~ dunif(0, 10)
}', file=asdf_jags)


# simulate data to go with the example model
n <- 45
x <- matrix(rnorm(n, sd=3),
            nrow=20, ncol=3)
y <- matrix(rnorm(n, mean=rep(1:3, each=20)-x),
            nrow=20, ncol=3)

asdf_data <- list(x=x,
                  y=y,
                  n=nrow(x),
                  ngrp=ncol(x))

# JAGS controls
niter <- 1000
ncores <- 2
# ncores <- min(10, parallel::detectCores()-1)

## random assignment of folds
kfold1 <- kfold(p="y",
                k=5,
                model.file=asdf_jags, data=asdf_data,
                n.chains=ncores, n.iter=niter,
                n.burnin=niter/2, n.thin=niter/1000,
                parallel=FALSE)
str(kfold1)
kfold1$fold

## Performing LOOCV, but assigning folds by row of input data
kfold2 <- kfold(p="y",
                loocv=TRUE, fold_dims=1,
                model.file=asdf_jags, data=asdf_data,
                n.chains=ncores, n.iter=niter,
                n.burnin=niter/2, n.thin=niter/1000,
                parallel=FALSE)
str(kfold2)
kfold2$fold

jagshelper documentation built on Oct. 22, 2024, 1:06 a.m.