runCrossValidate: Perform k-fold cross-validation on a NIMBLE model fit by MCMC
In nimble: MCMC, Particle Filtering, and Programmable Hierarchical Modeling

runCrossValidate

R Documentation

Perform k-fold cross-validation on a NIMBLE model fit by MCMC

Description

Takes a NIMBLE model MCMC configuration and conducts k-fold cross-validation of the MCMC fit, returning a measure of the model's predictive performance.

Usage

runCrossValidate(
  MCMCconfiguration,
  k,
  foldFunction = "random",
  lossFunction = "MSE",
  MCMCcontrol = list(),
  returnSamples = FALSE,
  nCores = 1,
  nBootReps = 200,
  silent = FALSE
)

Arguments

`MCMCconfiguration`	a NIMBLE MCMC configuration object, returned by a call to `configureMCMC`.
`k`	number of folds that should be used for cross-validation.
`foldFunction`	one of (1) an R function taking a single integer argument `i`, and returning a character vector with the names of the data nodes to leave out of the model for fold `i`, or (2) the character string `"random"`, indicating that data nodes will be randomly partitioned into `k` folds. Note that choosing "random" and setting `k` equal to the total number of data nodes in the model will perform leave-one-out cross-validation. Defaults to `"random"`. See ‘Details’.
`lossFunction`	one of (1) an R function taking a set of simulated data and a set of observed data, and calculating the loss from those, or (2) a character string naming one of NIMBLE's built-in loss functions. If a character string, must be one of `"predictive"` to use the log predictive density as a loss function or `"MSE"` to use the mean squared error as a loss function. Defaults to `"MSE"`. See ‘Details’ for information on creating a user-defined loss function.
`MCMCcontrol`	(optional) an R list with parameters governing the MCMC algorithm, See ‘Details’ for specific parameters.
`returnSamples`	logical indicating whether to return all posterior samples from all MCMC runs. This can result in a very large returned object (there will be `k` sets of posterior samples returned). Defaults to `FALSE`.
`nCores`	number of cpu cores to use in parallelizing the CV calculation. Only MacOS and Linux operating systems support multiple cores at this time. Defaults to 1.
`nBootReps`	number of bootstrap samples to use when estimating the Monte Carlo error of the cross-validation metric. Defaults to 200. If no Monte Carlo error estimate is desired, `nBootReps` can be set to `NA`, which can potentially save significant computation time.
`silent`	Boolean specifying whether to show output from the algorithm as it's running (default = FALSE).

Details

k-fold CV in NIMBLE proceeds by separating the data in a nimbleModel into k folds, as determined by the foldFunction argument. For each fold, the corresponding data are held out of the model, and MCMC is run to estimate the posterior distribution and simultaneously impute posterior predictive values for the held-out data. Then, the posterior predictive values are compared to the known, held-out data values via the specified lossFunction. The loss values are averaged over the posterior samples for each fold, and these averaged values for each fold are then averaged over all folds to produce a single out-of-sample loss estimate. Additionally, estimates of the Monte Carlo error for each fold are returned.

Value

an R list with four elements:

CVvalue. The CV value, measuring the model's ability to predict new data. Smaller relative values indicate better model performance.
CVstandardError. The standard error of the CV value, giving an indication of the total Monte Carlo error in the CV estimate.
foldCVInfo. An list of fold CV values and standard errors for each fold.
samples. An R list, only returned when returnSamples = TRUE. The i'th element of this list will be a matrix of posterior samples from the model with the i'th fold of data left out. There will be k sets of samples.

The `foldFunction` Argument

If the default 'random' method is not used, the foldFunction argument must be an R function that takes a single integer-valued argument i. i is guaranteed to be within the range [1, k]. For each integer value i, the function should return a character vector of node names corresponding to the data nodes that will be left out of the model for that fold. The returned node names can be expanded, but don't need to be. For example, if fold i is inteded to leave out the model nodes x[1], x[2] and x[3] then the function could return either c('x[1]', 'x[2]', 'x[3]') or 'x[1:3]'.

The `lossFunction` Argument

If you don't wish to use NIMBLE's built-in "MSE" or "predictive" loss functions, you may provide your own R function as the lossFunction argument to runCrossValidate. A user-supplied lossFunction must be an R function that takes two arguments: the first, named simulatedDataValues, will be a vector of simulated data values. The second, named actualDataValues, will be a vector of observed data values corresponding to the simulated data values in simulatedDataValues. The loss function should return a single scalar number. See ‘Examples’ for an example of a user-defined loss function.

The `MCMCcontrol` Argument

The MCMCcontrol argument is a list with the following elements:

niter. An integer argument determining how many MCMC iterations should be run for each loss value calculation. Defaults to 10000, but should probably be manually set.
nburnin. The number of samples from the start of the MCMC chain to discard as burn-in for each loss value calculation. Must be between 0 and niter. Defaults to 10

Author(s)

Nicholas Michaud and Lauren Ponisio

Examples

## Not run: 

## We conduct CV on the classic "dyes" BUGS model.

dyesCode <- nimbleCode({
  for (i in 1:BATCHES) {
    for (j in 1:SAMPLES) {
      y[i,j] ~ dnorm(mu[i], tau.within);
    }
    mu[i] ~ dnorm(theta, tau.between);
  }

  theta ~ dnorm(0.0, 1.0E-10);
  tau.within ~ dgamma(0.001, 0.001);  sigma2.within <- 1/tau.within;
  tau.between ~ dgamma(0.001, 0.001);  sigma2.between <- 1/tau.between;
})

dyesData <- list(y = matrix(c(1545, 1540, 1595, 1445, 1595,
                              1520, 1440, 1555, 1550, 1440,
                              1630, 1455, 1440, 1490, 1605,
                              1595, 1515, 1450, 1520, 1560, 
                              1510, 1465, 1635, 1480, 1580,
                              1495, 1560, 1545, 1625, 1445), 
                              nrow = 6, ncol = 5))
                              
dyesConsts <- list(BATCHES = 6,
                   SAMPLES = 5)
                   
dyesInits <- list(theta = 1500, tau.within = 1, tau.between =  1)

dyesModel <- nimbleModel(code = dyesCode,
                         constants = dyesConsts,
                         data = dyesData,
                         inits = dyesInits)

# Define the fold function.
# This function defines the data to leave out for the i'th fold
# as the i'th row of the data matrix y.  This implies we will have
# 6 folds.

dyesFoldFunction <- function(i){
  foldNodes_i <- paste0('y[', i, ', ]')  # will return 'y[1,]' for i = 1 e.g.
  return(foldNodes_i)
}

# We define our own loss function as well.
# The function below will compute the root mean squared error.

RMSElossFunction <- function(simulatedDataValues, actualDataValues){
  dataLength <- length(simulatedDataValues) # simulatedDataValues is a vector
  SSE <- 0
  for(i in 1:dataLength){
    SSE <- SSE + (simulatedDataValues[i] - actualDataValues[i])^2
  }
  MSE <- SSE / dataLength
  RMSE <- sqrt(MSE)
  return(RMSE)
}

dyesMCMCconfiguration <- configureMCMC(dyesModel)
  
crossValOutput <- runCrossValidate(MCMCconfiguration = dyesMCMCconfiguration,
                                   k = 6,
                                   foldFunction = dyesFoldFunction,
                                   lossFunction = RMSElossFunction,
                                   MCMCcontrol = list(niter = 5000,
                                                      nburnin = 500))

## End(Not run)

nimble documentation built on April 12, 2025, 2:28 a.m.

nimble index

Package overview README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

nimble
MCMC, Particle Filtering, and Programmable Hierarchical Modeling

runCrossValidate: Perform k-fold cross-validation on a NIMBLE model fit by MCMC
In nimble: MCMC, Particle Filtering, and Programmable Hierarchical Modeling

Perform k-fold cross-validation on a NIMBLE model fit by MCMC

Description

Usage

Arguments

Details

Value

The `foldFunction` Argument

The `lossFunction` Argument

The `MCMCcontrol` Argument

Author(s)

Examples

Related to runCrossValidate in nimble...

R Package Documentation

Browse R Packages

We want your feedback!

nimble MCMC, Particle Filtering, and Programmable Hierarchical Modeling

runCrossValidate: Perform k-fold cross-validation on a NIMBLE model fit by MCMC In nimble: MCMC, Particle Filtering, and Programmable Hierarchical Modeling

Perform k-fold cross-validation on a NIMBLE model fit by MCMC

Description

Usage

Arguments

Details

Value

The foldFunction Argument

The lossFunction Argument

The MCMCcontrol Argument

Author(s)

Examples

Related to runCrossValidate in nimble...

R Package Documentation

Browse R Packages

We want your feedback!

nimble
MCMC, Particle Filtering, and Programmable Hierarchical Modeling

runCrossValidate: Perform k-fold cross-validation on a NIMBLE model fit by MCMC
In nimble: MCMC, Particle Filtering, and Programmable Hierarchical Modeling

The `foldFunction` Argument

The `lossFunction` Argument

The `MCMCcontrol` Argument