add.reftable: Create or augment a list of simulated distributions of...

add_reftableR Documentation

Create or augment a list of simulated distributions of summary statistics

Description

add_reftable creates or augments a reference table of simulations, and formats the results appropriately for further use. The user does not have to think about this return format. Instead, s-he only has to think about the very simple return format of the function given as its Simulate argument. The primary role of his function is to wrap the call(s) of the function specified by Simulate. Depending on the arguments, parallel or serial computation is performed.

When parallelization is implied, it is performed by by default a “socket” cluster, available on all operating systems. Special care is then needed to ensure that all required packages are loaded in the called processes, and that all required variables and functions are passed therein: check the packages and env arguments. For socket clusters, foreach or pbapply is called depending whether the doSNOW package is attached (doSNOW allows more efficient load balancing than pbapply).

Alternatively, if the simulation function cannot be called directly by the R code, simulated samples can be added using the newsimuls argument. Finally, a generic data frame of simulated samples can be reformatted as a reference table by using only the reftable argument.

add_simulation is a wrapper for add_reftable, suitable when nRealizations>1. It is now distinctly documented: the distinct features of add_simulation were conceived for the first workflow implemented in Infusion but are somewhat obsolete now.

Usage

add_reftable(reftable=NULL, Simulate, parsTable=par.grid, par.grid=NULL, 
               nRealizations = 1L, newsimuls = NULL, 
               verbose = interactive(), nb_cores = NULL, packages = NULL, env = NULL,
               control.Simulate=NULL, cluster_args=list(), cl_seed=NULL, ...)

Arguments

reftable

Data frame: a reference table. Each row contains parameters value of a simulated realization of the data-generating process, and the simulated summary statistics. As parameters should be told apart from statistics by Infusion functions, information about parameter names should be attached to the reftable *if* it is not available otherwise. Thus if no parsTable is provided, the reftable should have an attribute "LOWER" (a named vectors giving lower bounds for the parameters which will vary in the analysis, as in the return value of the function).

Simulate

An *R* function, or the name (as a character string) of an *R* function used to generate summary statistics for samples form a data-generating process. When an external simulation program is called, Simulate must therefore be an R function wrapping the call to the external program. Two function APIs are handled:
* If the function has a parsTable argument, it must return a data frame of summary statistics, each line of which contains the vector of summary statistics for one realization of the data-generating process. The parsTable argument of add_reftable will be passed to Simulate and lines of the output data frame must be ordered, as in the input parsTable as these two data frames will be bound together.
* Otherwise, the Simulate function must have one argument for each element of the parameter vector (i.e. of each row of parsTable). It must return a vector of summary statistics with named vector member.

parsTable, par.grid

A data frame of which each line is the vector of parameters needed by Simulate for each simulation of the data-generating process. par.grid is an alias for parsTable; the latter argument may be preferred in order not to suggest that the parameter values should form a regular grid.

nRealizations

The number of simulated samples of summary statistics, for each parameter vector (each row of parsTable). If not 1, theold wrkflow is assumed and add_simulation is called.

newsimuls

If the function used to generate empirical distributions cannot be called by R, then newsimuls can be used to provide these distributions. See Details for the structure of this argument.

nb_cores

Number of cores for parallel simulation; NULL or integer value, acting as a shortcut for cluster_args$spec. This is effective only if the simulation function is called separately for each row of parsTable. Otherwise, if the simulation function is called once one the whole parsTable, parallelisation could be controlled only through that function's own arguments.

cluster_args

A list of arguments, passed to makeCluster. May contain a non-null spec element, in which case the distinct nb_cores argument and the global Infusion option nb_cores are ignored. A typical usage would thus be control_args=list(spec=<number of 'children'>). Additional elements outfile="log.txt" may be useful to collect output from the nodes, and type="FORK" may be used to force a fork cluster on linux(-alikes) (otherwise a socket cluster is set up as this is the default effect of parallel::makeCluster). Do *not* use a structured list with an add_reftable element as is possible for refine (see Details of refine documentation).

verbose

Whether to print some information or not.

...

Additional arguments passed to Simulate, beyond the parameter vector. These arguments should be constant through all the simulation workflow.

control.Simulate

A list, used as an exclusive alternative to “...” to pass additional arguments to Simulate, beyond the parameter vector. The list must contain the same elements as would otherwise go in the “...” (if control.Simulate is left NULL, a default value is constructed from the ...).

packages

For parallel evaluation: Names of additional libraries to be loaded on the cores, necessary for Simulate evaluation.

env

For parallel evaluation: an environment containing additional objects to be exported on the cores, necessary for Simulate evaluation.

cl_seed

(all parallel contexts:) Integer, or NULL. If an integer, it is used to initialize "L'Ecuyer-CMRG" random-number generator. If cl_seed is NULL, the default generator is selected on each node, where its seed is not controlled. Providing the seed allows repeatable results for given parallelization settings, but may not allow identical results across different settings.

Details

The newsimuls argument should have the same structure as the return value of the function itself, except that newsimuls may include only a subset of the attributes returned by the function. It is thus a data frame; its required attributes are LOWER and UPPER which are named vectors giving bounds for the parameters which are variable in the whole analysis (note that the names identify these parameters in the case this information is not available otherwise from the arguments). The values in these vectors may be incorrect in the sense of failing to bound the parameters in the newsimuls, as the actual bounds are then corrected using parameter values in newsimuls and attributes from reftable.

Value

A data.frame (with additional attributes) is returned.

The value has the following attributes: LOWER and UPPER which are each a vector of per-parameter minima and maxima deduced from any newsimuls argument, and optionally any of the arguments Simulate, control.Simulate, packages, env, parsTable and reftable (all corresponding to input arguments when provided, except that the actual Simulate function is returned even if it was input as a name).

Examples

## see main documentation page for the package for other typical usage

Infusion documentation built on May 3, 2023, 5:10 p.m.