ABC_mcmc: Coupled to MCMC schemes for ABC In EasyABC: Efficient Approximate Bayesian Computation Sampling Schemes

Description

This function implements three different algorithms to perform coupled to MCMC ABC.

Usage

 1 2 3 ABC_mcmc(method, model, prior, summary_stat_target, prior_test=NULL, n_rec=100, n_between_sampling=10, n_cluster = 1, use_seed = FALSE, verbose = FALSE, dist_weights=NULL, ...)

Arguments

 method a character string indicating the ABC-MCMC algorithm to be used. Possible values are "Marjoram_original", "Marjoram" and "Wegmann". Note that the method "Marjoram_original" cannot be used with multiple cores. model a R function implementing the model to be simulated. It must take as arguments a vector of model parameter values and it must return a vector of summary statistics. When using the option use_seed=TRUE, model must take as arguments a vector containing a seed value and the model parameter values. A tutorial is provided in the package's vignette to dynamically link a binary code to a R function. Users may alternatively wish to wrap their binary executables using the provided functions binary_model and binary_model_cluster. The use of these functions is associated with slightly different constraints on the design of the binary code (see binary_model and binary_model_cluster). prior a list of prior information. Each element of the list corresponds to a model parameter. The list element must be a vector whose first argument determines the type of prior distribution: possible values are "unif" for a uniform distribution on a segment, "normal" for a normal distribution, "lognormal" for a lognormal distribution or "exponential" for an exponential distribution. The following arguments of the list elements contain the characteritiscs of the prior distribution chosen: for "unif", two numbers must be given: the minimum and maximum values of the uniform distribution; for "normal", two numbers must be given: the mean and standard deviation of the normal distribution; for "lognormal", two numbers must be given: the mean and standard deviation on the log scale of the lognormal distribution; for "exponential", one number must be given: the rate of the exponential distribution. User-defined prior distributions can also be provided. See the vignette for additional information on this topic. summary_stat_target a vector containing the targeted (observed) summary statistics. prior_test a string expressing the constraints between model parameters. This expression will be evaluated as a logical expression, you can use all the logical operators including "<", ">", ... Each parameter should be designated with "X1", "X2", ... in the same order as in the prior definition. If not provided, no constraint will be applied. n_rec a positive integer equal to the desired number of sampled points along the MCMC. n_between_sampling a positive integer equal to the desired spacing between sampled points along the MCMC. n_cluster a positive integer. If larger than 1 (the default value), ABC_mcmc will launch model simulations in parallel on n_cluster cores of the computer. use_seed logical. If FALSE (default), ABC_mcmc provides as input to the function model a vector containing the model parameters used for the simulation. If TRUE, ABC_mcmc provides as input to the function model a vector containing an integer seed value and the model parameters used for the simulation. In this last case, the seed value should be used by model to initialize its pseudo-random number generators (if model is stochastic). verbose logical. FALSE by default. If TRUE, ABC_mcmc writes in the current directory intermediary results at the end of each step of the algorithm in the file "output_mcmc". These outputs have a matrix format, in wich each raw is a different simulation, the first columns are the parameters used for this simulation, the following columns are the summary statistics of this simulation, and the last column is the distance between the simulation and the data. dist_weights a vector containing the weights to apply to the distance between the computed and the targeted statistics. These weights can be used to give more importance to a summary statistisc for example. The weights will be normalized before applying them. If not provided, no weights will be applied. ... Additional arguments can be passed depending on the choosen method (see below)

Details

See the package's vignette for details on ABC-MCMC.

Value

The returned value is a list containing the following components:

 param The model parameters used in the model simulations. stats The summary statistics obtained at the end of the model simulations. dist The distance of the simulations to the data. stats_normalization The standard deviation of the summary statistics across the model simulations of the initial step. These values are used to normalize the summary statistics before the computation of the Euclidean distance between simulations and data. If method is "Marjoram_original", this is equal to tab_normalization. If method is "Wegmann", this is not provided. epsilon The final maximal distance between simulations and data in the retained sample of particles. nsim The number of model simulations performed. n_between_sampling The spacing between two sampled points in the MCMC. computime The computing time to perform the simulations. min_stats The minimal values of each summary statistics during the calibration step, given when method is "Wegmann". max_stats The maximal values of each summary statistics during the calibration step, given when method is "Wegmann". lambda The lambda values of the Box-Cox transformation, given when method is "Wegmann". geometric_mean The geometric means, given when method is "Wegmann". boxcox_mean The means of Box-Cox transforms, given when method is "Wegmann". boxcox_sd The standard deviations of Box-Cox transforms, given when method is "Wegmann". pls_transform The matrix of PLS transformation, given when method is "Wegmann". numcomp The number of used components for the PLS transformation, given when method is "Wegmann".

Depending on the choosen method, you can specify the following arguments:

dist_max

a positive number, used when method is "Marjoram_original". This is the tolerance threshold used during the MCMC. If not provided by the user, it is automatically computed as half the distance between the first simulation and the target summary statistics and a warning is printed.

tab_normalization

a vector of the same length as summary_stat_target, used when method is "Marjoram_original". Each element contains a positive number by which each summary statistics must be divided before the computation of the Euclidean distance between simulations and data. If not provided by the user, the simulated summary statistics are divided by the target summary statistics and a warning is printed.

proposal_range

a vector of the same length as the number of model parameters, used when method is "Marjoram_original". Each element contains a positive number defining the range of MCMC jumps for each model parameter. If not provided by the user, a default value is used for each parameter and a warning is printed. The default value is 1/50 of the prior range for uniform distributions, 1/20 of the standard deviation of the prior distribution for normal distributions, 1/20 * exp ( sigma * sigma

for lognormal distributions where sigma is the standard deviation of the prior distribution in the log scale, and 1/20 of the inverse of the rate for exponential distributions.

n_calibration

a positive integer, used when method is "Marjoram" or "Wegmann". This is the number of simulations performed during the calibration step. Default value is 10000.

tolerance_quantile

a positive number between 0 and 1 (strictly), used when method is "Marjoram" or "Wegmann". This is the percentage of simulations retained during the calibration step to determine the tolerance threshold to be used during the MCMC. Default value is 0.01.

proposal_phi

a positive number, used when method is "Marjoram" or "Wegmann". This is a scaling factor defining the range of MCMC jumps. Default value is 1.

numcomp

a positive integer, used when method is "Wegmann". This is the number of components to be used for PLS transformations. Default value is 0 which encodes that this number is equal to the number of summary statistics.

seed_count

a positive integer, the initial seed value provided to the function model (if use_seed=TRUE). This value is incremented by 1 at each call of the function model.

progress_bar

logical, FALSE by default. If TRUE, ABC_mcmc will output a bar of progression with the estimated remaining computing time. Option not available with multiple cores.

max_pick

a positive number, the max number of fails when moving particle inside the prior. Enabled only if inside_prior is to TRUE. 10000 by default.

Author(s)

Franck Jabot, Thierry Faure and Nicolas Dumoulin

References

Marjoram, P., Molitor, J., Plagnol, V. and Tavar\'e, S. (2003) Markov chain Monte Carlo without likelihoods. PNAS, 100, 15324–15328.

Wegmann, D., Leuenberger, C. and Excoffier, L. (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics, 182, 1207-1218.