Run or extend a user-specified Bayesian MCMC model in JAGS from within R

Share:

Description

Runs or extends a user specified JAGS model from within R, returning an object of class runjags-class.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
run.jags(model, monitor = NA, data = NA, n.chains = NA, inits = NA,
  burnin = 4000, sample = 10000, adapt = 1000, noread.monitor = NULL,
  datalist = NA, initlist = NA, jags = runjags.getOption("jagspath"),
  silent.jags = runjags.getOption("silent.jags"),
  modules = runjags.getOption("modules"),
  factories = runjags.getOption("factories"), summarise = TRUE,
  mutate = NA, thin = 1, keep.jags.files = FALSE,
  tempdir = runjags.getOption("tempdir"), jags.refresh = 0.1,
  batch.jags = silent.jags, method = runjags.getOption("method"),
  method.options = list(), ...)

extend.jags(runjags.object, add.monitor = character(0),
  drop.monitor = character(0), drop.chain = numeric(0),
  combine = length(c(add.monitor, drop.monitor, drop.chain)) == 0,
  burnin = 0, sample = 10000, adapt = 1000, noread.monitor = NA,
  jags = NA, silent.jags = NA, summarise = sample >= 100, thin = NA,
  keep.jags.files = FALSE, tempdir = runjags.getOption("tempdir"),
  jags.refresh = NA, batch.jags = silent.jags, method = NA,
  method.options = NA, ...)

Arguments

model

either a relative or absolute path to a textfile (including the file extension) containing a model in the JAGS language and possibly monitored variable names, data and/or initial values, or a character string of the same. No default. See read.jagsfile for more details.

monitor

a character vector of the names of variables to monitor. No default. The special node names 'deviance', 'pd', 'popt', 'dic', 'ped' and 'full.pd' are used to monitor the deviance, mean pD, mean pOpt, DIC, PED and full distribution of sum(pD) respectively. Note that these monitored nodes (with the exception of 'deviance') require multiple chains within the same simulation, and won't appear as variables in the summary statistics or plots (but see extract for a way of extracting these from the returned object).

data

a named list, data frame, environment, character string in the R dump format (see dump.format), or a function (with no arguments) returning one of these types. If the model text contains inline #data# comments, then this argument specifies the list, data frame or environment in which to search first for these variables (the global environment is always searched last). If the model text does not contain #data# comments, then the full list or data frame (but not environment) is included as data. If the data argument is a character string, then any #data# comments in the model are ignored (with a warning). The default specifies the parent environment of the function call.

n.chains

the number of chains to use with the simulation. More chains will improve the sensitivity of the convergence diagnostic, but will cause the simulation to run more slowly (although this may be improved by using a method such as 'parallel', 'rjparallel' or 'snow'). The minimum (and default) number of chains is 2.

inits

either a character vector with length equal to the number of chains the model will be run using, or a list of named lists representing names and corresponding values of inits for each chain, or a function with either 1 argument representing the chain or no arguments. If a vector, each element of the vector must be a character string in the R dump format representing the initial values for that chain, or NA. If not all initialising variables are specified, the unspecified variables are taken deterministically from the mean or mode of the prior distribution by JAGS. Values left as NA result in all initial values for that chain being taken from the prior distribution. The special variables '.RNG.seed', '.RNG.name', and '.RNG.state' are allowed for explicit control over random number generators in JAGS. If a function is provided, the data is available inside the function as a named list 'data' - this may be useful for setting initial values that depend on the data. Default NA. Note that the dimensions of any variables used for initial values must match the dimensions of the same parameter in the model - recycling is not performed. If any elements of the initial values have deterministic values in the model, the corresponding elements must be defined as NA in the initial values.

burnin

the number of burnin iterations, NOT including the adaptive iterations to use for the simulation. Note that the default is 4000 plus 1000 adaptive iterations, with a total of 5000.

sample

the total number of (additional) samples to take. Default 10000 iterations. If specified as 0, then the model will be created and returned without any MCMC samples (burnin and adapt will be ignored). Note that a minimum of 100 samples is required to generate summary statistics.

adapt

the number of adaptive iterations to use at the start of the simulation. If the adaptive phase is not long enough, the sampling efficiency of the MCMC chains will be compromised. If the model does not require adaptation (either because a compiled rjags model is already available or because the model contains no data) then this will be ignored, with a warning that the model is not in adaptive mode. Default 1000 iterations.

noread.monitor

an optional character vector of variables to monitor in JAGS and output to coda files, but that should not be read back into R. This may be useful (in conjunction with keep.jags.files=TRUE) for looking at large numbers of variables a few at a time using the read.monitor argument to results.jags. This argument is ignored for the rjags and rjparallel methods, and if keep.jags.files=FALSE.

datalist

deprecated argument.

initlist

deprecated argument.

jags

the system call or path for activating JAGS. Default uses the option given in runjags.options.

silent.jags

option to suppress output of the JAGS simulations. Default uses the option given in runjags.options.

modules

a character vector of external modules to be loaded into JAGS, either as the module name on its own or as the module name and status separated by a space, for example 'glm on'.

factories

a character vector of factory modules to be loaded into JAGS. Factories should be provided in the format '<facname> <factype> <status>' (where status is optional), for example: factories='mix::TemperedMix sampler on'. You must also ensure that any required modules are also specified (in this case 'mix').

summarise

should summary statistics be automatically calculated for the output chains? Default TRUE (but see also ?runjags.options -> force.summary).

mutate

either a function or a list with first element a function and remaining elements arguments to this function. This can be used to add new variables to the posterior chains that are derived from the directly monitored variables in JAGS. This allows the variables to be summarised or extracted as part of the MCMC objects as if they had been calculated in JAGS, but without the computational or storage overheads associated with calculating them in JAGS directly. The plot, summary and as.mcmc methods for runjags objects will automatically extract the mutated variables along with the directly monitored variables. For an application to pairwise comparisons of different levels within fixed effects see contrasts.mcmc.

thin

the thinning interval to be used in JAGS. Increasing the thinning interval may reduce autocorrelation, and therefore reduce the number of samples required, but will increase the time required to run the simulation. Using this option thinning is performed directly in JAGS, rather than on an existing MCMC object as with thin.sample. Default 1.

keep.jags.files

option to keep the folder with files needed to call JAGS, rather than deleting it. This allows the simulation results to be re-read using results.jags(path-to-folder), even from another R session, and may also be useful for attempting to bug fix models. A character string can also provided, in which case this folder name will be used instead of the default (existing folders will NOT be over-written). Default FALSE. See also the cleanup.jags function.

tempdir

option to use the temporary directory as specified by the system rather than creating files in the working directory. If keep.jags.files=TRUE then the folder is copied to the working directory after the job has finished (with a unique folder name based on 'runjagsfiles'). Any files created in the temporary directory are removed when the function exits for any reason. It is not possible to use a temporary directory with the background methods, so tempdir will be set to FALSE if not done so by the user (possibly with a warning depending on the settings in runjags.options). Default TRUE.

jags.refresh

the refresh interval (in seconds) for monitoring JAGS output using the 'interactive' and 'parallel' methods (see the 'method' argument). Longer refresh intervals will use slightly less processor time, but will make the simulation updates to be shown on the screen less frequently. Reducing the refresh rate to every 10 or 30 seconds may be worthwhile for simulations taking several days to run. Note that this will have no effect on the processor use of the simulations themselves. Default 0.1 seconds.

batch.jags

option to call JAGS in batch mode, rather than using input redirection. On JAGS >= 3.0.0, this suppresses output of the status which may be useful in some situations. Default TRUE if silent.jags is TRUE, or FALSE otherwise.

method

the method with which to call JAGS; probably a character vector specifying one of 'rjags', 'simple', 'interruptible', 'parallel', 'rjparallel', 'background', 'bgparallel' or 'snow'. The 'rjags' and 'rjparallel' methods run JAGS using the rjags package, whereas other options do not require the rjags package and call JAGS as an external executable. The advantage of the 'rjags' method is that the model will not need to be recompiled between successive calls to extend.jags, all other methods require a re-compilation (and adaptation if necessary) step at every call to extend.jags. Note that the 'rjparallel' and 'snow' methods may leave behind zombie JAGS processes if the user interrupts the R session used to start the simulations - for this reason the 'parallel' method is recommended for interactive use with parallel chains. The 'background' and 'bgparallel' return a filename for the started simulation, which can be read using results.jags. The 'parallel' and 'interruptible' methods for Windows require XP Professional, Vista or later (or any Unix-alike). For more information refer to the userguide vignette.

method.options

a deprecated argument currently permitted for backwards compatibility, but this will be removed from a future version of runjags. Pass these arguments directly to run.jags or extend.jags.

...

summary parameters to be passed to add.summary, and/or additional options to control some methods including n.sims for parallel methods, cl for rjparallel and snow methods, remote.jags for snow, and by and progress.bar for the rjags method.

runjags.object

the model to be extended - the output of a run.jags (or autorun.jags or extend.jags etc) function, with class 'runjags'. No default.

add.monitor

a character vector of variables to add to the monitored variable list. All previously monitored variables are automatically included - although see the 'drop.monitor' argument. Default no additional monitors.

drop.monitor

a character vector of previously monitored variables to remove from the monitored variable list for the extended model. Default none.

drop.chain

a numeric vector of chains to remove from the extended model. Default none.

combine

a logical flag indicating if results from the new JAGS run should be combined with the previous chains. Default TRUE if not adding or removing variables or chains, and FALSE otherwise.

Details

The run.jags function reads, compiles, and updates a JAGS model based on a model representation (plus data, monitors and initial values) input by the user. The model can be contained in an external text file, or a character vector within R. The autorun.jags function takes an existing runjags-class object and extends the simulation. Running a JAGS model using these functions has two main advantages:

1) The method used to call or extend the simulation can be changed simply using the method option. The methods most likely to be used are 'interruptible' and 'rjags' which use one simulation per model, or 'parallel', 'bgparallel' and 'rjparallel' which run a separate simulation for each chain to speed up the model run. For more details see below under the 'method' argument.

2) All information required to re-run the simulations is stored within the runjags-class object returned. This complete representation can be written to a text file using write.jagsfile, then modified as necessary and re-run using only the file path as input.

3) Summary statistics for the returned simulations are automatically calculated and displayed using associated S3 methods intended to facilitate checking model convergence and run length. Additional methods are available for plot functions, as well as conversion to and from MCMC and rjags objects. See the help file for runjags-class for more details.

Value

Usually an object of class 'runjags', or an object of class 'runjagsbginfo' for background methods (see runjags-class).

References

Matthew J. Denwood (2016). runjags: An R Package Providing Interface Utilities, Model Templates, Parallel Computing Methods and Additional Distributions for MCMC Models in JAGS. Journal of Statistical Software, 71(9), 1-25. doi:10.18637/jss.v071.i09

See Also

results.jags to import completed simulations (or partially successful simulations) from saved JAGS files, runjags-class for details of available methods for the returned object, read.jagsfile for more details on the permitted format of the model file, write.jagsfile for a way to write an existing runjags object to file, and runjags.options for user options regarding warning messages etc.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# run a model to calculate the intercept and slope of the expression 
# y = m x + c, assuming normal observation errors for y:

# Simulate the data
X <- 1:100
Y <- rnorm(length(X), 2*X + 10, 1)

# Model in the JAGS format
model <- "model { 
for(i in 1 : N){ 
	Y[i] ~ dnorm(true.y[i], precision);
	true.y[i] <- (m * X[i]) + c
} 
m ~ dunif(-1000,1000)
c ~ dunif(-1000,1000) 
precision ~ dexp(1)
}"

# Data and initial values in a named list format, 
# with explicit control over the random number
# generator used for each chain (optional): 
data <- list(X=X, Y=Y, N=length(X))
inits1 <- list(m=1, c=1, precision=1,
.RNG.name="base::Super-Duper", .RNG.seed=1)
inits2 <- list(m=0.1, c=10, precision=1,
.RNG.name="base::Wichmann-Hill", .RNG.seed=2)

## Not run: 
# Run the model and produce plots 
results <- run.jags(model=model, monitor=c("m", "c", "precision"), 
data=data, n.chains=2, method="rjags", inits=list(inits1,inits2))

# Standard plots of the monitored variables:
plot(results)

# Look at the summary statistics:
print(results)

# Extract only the coefficient as an mcmc.list object:
library('coda')
coeff <- as.mcmc.list(results, vars="m")

## End(Not run)


# The same model but using embedded shortcuts to specify data, inits and monitors,
# and using parallel chains:

# Model in the JAGS format

model <- "model {
for(i in 1 : N){ #data# N
	Y[i] ~ dnorm(true.y[i], precision) #data# Y
	true.y[i] <- (m * X[i]) + c #data# X
}
m ~ dunif(-1000,1000) #inits# m
c ~ dunif(-1000,1000)
precision ~ dexp(1)
#monitor# m, c, precision
}"

# Simulate the data
X <- 1:100
Y <- rnorm(length(X), 2*X + 10, 1)
N <- length(X)

initfunction <- function(chain) return(switch(chain, 
	"1"=list(m=-10), "2"=list(m=10)))

## Not run: 
# Run the 2 chains in parallel (allowing the run.jags function
# to control the number of parallel chains). We also use a
# mutate function to convert the precision to standard deviation:
results <- run.jags(model, n.chains=2, inits=initfunction,
method="parallel", mutate=list("prec2sd", vars="precision"))

# View the results using the standard print method:
results

# Look at some plots of the intercept and slope on a 3x3 grid:
plot(results, c('trace','histogram','ecdf','crosscorr','key'),
vars=c("m","^c"),layout=c(3,3))

# Write the current model representation to file:
write.jagsfile(results, file='mymod.txt')
# And re-run the simulation from this point:
newresults <- run.jags('mymod.txt')

## End(Not run)
# Run the same model using 8 chains in parallel:
# distributed computing cluster:
## Not run: 

# A list of 8 randomly generated starting values for m:
initlist <- replicate(8,list(m=runif(1,-20,20)),simplify=FALSE)

# Run the chains in parallel using JAGS (2 models 
# with 4 chains each):
results <- run.jags(model, n.chains=8, inits=initlist,
method="parallel", n.sims=2)

# Set up a distributed computing cluster with 2 nodes:
library(parallel)
cl <- makeCluster(4)

# Run the chains in parallel rjags models (4 models 
# with 2 chains each) on this cluster:
results <- run.jags(model, n.chains=8, inits=initlist,
method="rjparallel", cl=cl)

stopCluster(cl)

# For more examples see the quick-start vignette:
vignette('quickjags', package='runjags')

# And for more details about possible methods see:
vignette('userguide', package='runjags')

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.