View source: R/crossValidation.R
runCrossValidate  R Documentation 
Takes a NIMBLE model MCMC configuration and conducts kfold crossvalidation of the MCMC fit, returning a measure of the model's predictive performance.
runCrossValidate( MCMCconfiguration, k, foldFunction = "random", lossFunction = "MSE", MCMCcontrol = list(), returnSamples = FALSE, nCores = 1, nBootReps = 200, silent = FALSE )
MCMCconfiguration 
a NIMBLE MCMC configuration object, returned by a
call to 
k 
number of folds that should be used for crossvalidation. 
foldFunction 
one of (1) an R function taking a single integer argument

lossFunction 
one of (1) an R function taking a set of simulated data
and a set of observed data, and calculating the loss from those, or (2) a
character string naming one of NIMBLE's builtin loss functions. If a
character string, must be one of 
MCMCcontrol 
(optional) an R list with parameters governing the MCMC algorithm, See ‘Details’ for specific parameters. 
returnSamples 
logical indicating whether to return all
posterior samples from all MCMC runs. This can result in a very large returned
object (there will be 
nCores 
number of cpu cores to use in parallelizing the CV calculation. Only MacOS and Linux operating systems support multiple cores at this time. Defaults to 1. 
nBootReps 
number of bootstrap samples
to use when estimating the Monte Carlo error of the crossvalidation metric. Defaults to 200. If no Monte Carlo error estimate is desired,

silent 
Boolean specifying whether to show output from the algorithm as it's running (default = FALSE). 
kfold CV in NIMBLE proceeds by separating the data in a nimbleModel
into k
folds, as determined by the
foldFunction
argument. For each fold, the corresponding data are held out of the model,
and MCMC is run to estimate the posterior distribution and simultaneously impute
posterior predictive values for the heldout data.
Then, the posterior predictive values are compared to the
known, heldout data values via the specified lossFunction
. The loss values are
averaged over the posterior samples for each fold, and these averaged values for each
fold are then averaged over all folds to produce a single outofsample
loss estimate. Additionally, estimates of the Monte Carlo error for each fold are returned.
an R list with four elements:
CVvalue
The CV value, measuring the model's ability to predict new data. Smaller relative values indicate better model performance.
CVstandardError
The standard error of the CV value, giving an indication of the total Monte Carlo error in the CV estimate.
foldCVInfo
An list of fold CV values and standard errors for each fold.
samples
An R list, only returned when returnSamples = TRUE
. The i'th element of this list will be a matrix of posterior samples from the model with the i'th fold of data left out. There will be k
sets of samples.
foldFunction
ArgumentIf the default 'random'
method is not used, the foldFunction
argument
must be an R function that takes a single integervalued argument i
. i
is guaranteed to be within the range [1, k]. For each integer value i
,
the function should return a character vector of node names corresponding to the data
nodes that will be left out of the model for that fold. The returned node names can be expanded,
but don't need to be. For example, if fold i
is inteded to leave out the model nodes
x[1]
, x[2]
and x[3]
then the function could return either
c('x[1]', 'x[2]', 'x[3]')
or 'x[1:3]'
.
lossFunction
ArgumentIf you don't wish to use NIMBLE's builtin "MSE"
or "predictive"
loss
functions, you may provide your own R function as the lossFunction
argument to
runCrossValidate
. A usersupplied lossFunction
must be an R function
that takes two arguments: the first, named simulatedDataValues
, will be a vector
of simulated data values. The second, named actualDataValues
, will be a vector of
observed data values corresponding to the simulated data values in simulatedDataValues
.
The loss function should return a single scalar number.
See ‘Examples’ for an example of a userdefined loss function.
MCMCcontrol
ArgumentThe MCMCcontrol
argument is a list with the following elements:
niter
an integer argument determining how many MCMC iterations should be run for each loss value calculation. Defaults to 10000, but should probably be manually set.
nburnin
the number of samples from the start of the MCMC chain to discard as burnin for each loss value calculation. Must be between 0 and niter
. Defaults to 10
Nicholas Michaud and Lauren Ponisio
## Not run: ## We conduct CV on the classic "dyes" BUGS model. dyesCode < nimbleCode({ for (i in 1:BATCHES) { for (j in 1:SAMPLES) { y[i,j] ~ dnorm(mu[i], tau.within); } mu[i] ~ dnorm(theta, tau.between); } theta ~ dnorm(0.0, 1.0E10); tau.within ~ dgamma(0.001, 0.001); sigma2.within < 1/tau.within; tau.between ~ dgamma(0.001, 0.001); sigma2.between < 1/tau.between; }) dyesData < list(y = matrix(c(1545, 1540, 1595, 1445, 1595, 1520, 1440, 1555, 1550, 1440, 1630, 1455, 1440, 1490, 1605, 1595, 1515, 1450, 1520, 1560, 1510, 1465, 1635, 1480, 1580, 1495, 1560, 1545, 1625, 1445), nrow = 6, ncol = 5)) dyesConsts < list(BATCHES = 6, SAMPLES = 5) dyesInits < list(theta = 1500, tau.within = 1, tau.between = 1) dyesModel < nimbleModel(code = dyesCode, constants = dyesConsts, data = dyesData, inits = dyesInits) # Define the fold function. # This function defines the data to leave out for the i'th fold # as the i'th row of the data matrix y. This implies we will have # 6 folds. dyesFoldFunction < function(i){ foldNodes_i < paste0('y[', i, ', ]') # will return 'y[1,]' for i = 1 e.g. return(foldNodes_i) } # We define our own loss function as well. # The function below will compute the root mean squared error. RMSElossFunction < function(simulatedDataValues, actualDataValues){ dataLength < length(simulatedDataValues) # simulatedDataValues is a vector SSE < 0 for(i in 1:dataLength){ SSE < SSE + (simulatedDataValues[i]  actualDataValues[i])^2 } MSE < SSE / dataLength RMSE < sqrt(MSE) return(RMSE) } dyesMCMCconfiguration < configureMCMC(dyesModel) crossValOutput < runCrossValidate(MCMCconfiguration = dyesMCMCconfiguration, k = 6, foldFunction = dyesFoldFunction, lossFunction = RMSElossFunction, MCMCcontrol = list(niter = 5000, nburnin = 500)) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.