Drop-k and simulated dataset studies using JAGS

Share:

Description

These functions can be used to fit a user specified JAGS model to multiple datasets with automatic control of run length and convergence, over a distributed computing cluster such as that provided by snow. The results for monitored variables are compared to the target values provided and a summary of the model performance is returned. This may be used to facilitate model validation using simulated data, or to assess model fit using a 'drop-k' type cross validation study where one or more data points are removed in turn and the model's ability to predict that datapoint is assessed.

Usage

1
2
3
4
5
6
drop.k(runjags.object, dropvars, k = 1, simulations = NA, ...)

run.jags.study(simulations, model, datafunction, targets = list(),
  confidence = 0.95, record.chains = FALSE, max.time = "15m",
  silent.jags = TRUE, parallel.method = parLapply, n.cores = NA,
  export.cluster = character(0), inits = list(), ...)

Arguments

runjags.object

an object of class runjagsstudy-class on which to perform the drop-k analysis

dropvars

the variable(s) to be eliminated from the data so that the ability of the model to predict these datapoints can be assessed. The variable can be specified as a vector, or as a single character for which partial matching will be done. Array indices can be used, but must be specified as a complete range e.g. variable[2:5,2] is permitted, but variable[,2] is not because the first index is empty

k

the number of datapoints to be dropped from each individual simulation. The default of 1 is a drop-1 study (also called a leave-one-out cross validation study).

simulations

the number of datasets to run the model on. For drop.k the default is to use the number of unique datapoints, resulting in a drop-1 study. If the specified number of simulations is different to the number of unique datapoints, the datapoints are dropped randomly between simulations.

...

optional arguments to be passed to autorun.jags, or to the parallel method function (such as 'cl').

model

the JAGS model to use, in the same format as would be specified to run.jags.

datafunction

a function that will be used to specify the data. This must take either zero arguments, or one argument representing the simulation number, and return either a named list or character vector in the R dump format containing the data specific to that simulation. It is possible to specify any data that does not change for each simulation using a #data# <variable> tag in the model code.

targets

a named list of variables (which can include vectors/arrays) with values to which the model outputs are compared (if stochastic). The target variable names are also automatically included as monitored variables.

confidence

a probability (or vector of probabilities) to use when calculating the proportion of credible intervals containing the true target value. Default 95% CI.

record.chains

option to return the full runjags objects returned from each simulation as a list item named 'runjags'.

max.time

the maximum time for which each individual simulation is allowed to run by the underling autorun.jags function. Acceptable units include 'seconds', 'minutes', 'hours', 'days', 'weeks', or the first letter(s) of each. Default is 15 minutes.

silent.jags

option to suppress all JAGS output, even for simulations run locally. If set to FALSE, there is no guarantee that the output will be displayed in sequential order between the parallel simulations. Default TRUE.

parallel.method

a function that will be used to call the repeated simulations. This must take the first two arguments 'X' and 'FUN' as for lapply, with other optional arguments passed through from the parent function call. Default uses parLapply, but lapply or mclapply could also be used.

n.cores

the maximum number of cores to use for parallel simulations. Default value uses detectCores, or a minumum of 2. Ignored if cl is supplied, or if parallel.method does not take a cl argument.

export.cluster

a character vector naming objects to be retrieved from the parent frame of the function call and made available to the cluster nodes. This may be useful if the initial values specified for the model are required to be extracted from the working environment, however it may be preferable to specify a function for inits instead.

inits

as for run.jags, except that it is not permitted to be an environment. It is recommended to a function to return appropriate initial values (which may depend on the data visible when the function is evaluated).

Details

The drop.k function is a wrapper to run.jags.study for the common application of drop-k cross validation studies on fitted JAGS models. The run.jags.study function is more flexible, and can be used for validating the performance of a model against simulated data with known parameters. For the latter, a user-specified function to generate suitable datasets to analyse is required.

Value

An object of class runjagsstudy-class, containing a summary of the performance of the model with regards to the target variables specified. If record.chains=TRUE, an element named 'runjags' containing a list of all the runjags objects returned will also be present. Any error messages given by individual simulations will be contained in the $errors element of the returned list.

References

M. J. Denwood, "runjags: An R Package Providing Interface Utilities, Distributed Computing Methods and Additional Distributions For MCMC Models in JAGS," Journal of Statistical Software, [Under review].

See Also

autorun.jags for the underlying methods used to run simulations to convergence, and runjagsstudy-class for details of the returned object

Examples

1
2
3
4
5
# For examples of usage see the following vignette:
## Not run: 
vignette('userguide', package='runjags')

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.