simulateWithPriors: Simulate data for initial TreEvo analysis
In bomeara/treevo: Using ABC to Understand Trait Evolution

simulateWithPriors

R Documentation

Simulate data for initial TreEvo analysis

Description

The simulateWithPriors function pulls parameters from prior distributions and conducts a single simulation of continuous trait evolution (using the doSimulation function), returning useful summary statistics for ABC. parallelSimulateWithPriors is a wrapper function for simulateWithPriors that allows for multithreading and checkpointing. This family of functions is mostly used as internal components, generating simulations within ABC analyses using the doRun functions. See Note below.

Usage

simulateWithPriors(
  phy = NULL,
  intrinsicFn,
  extrinsicFn,
  startingPriorsFns,
  startingPriorsValues,
  intrinsicPriorsFns,
  intrinsicPriorsValues,
  extrinsicPriorsFns,
  extrinsicPriorsValues,
  generation.time = 1000,
  TreeYears = max(branching.times(phy)) * 1e+06,
  timeStep = NULL,
  giveUpAttempts = 10,
  verbose = FALSE,
  checks = TRUE,
  taxonDF = NULL,
  freevector = NULL
)

parallelSimulateWithPriors(
  nrepSim,
  multicore,
  coreLimit,
  phy,
  intrinsicFn,
  extrinsicFn,
  startingPriorsFns,
  startingPriorsValues,
  intrinsicPriorsFns,
  intrinsicPriorsValues,
  extrinsicPriorsFns,
  extrinsicPriorsValues,
  generation.time = 1000,
  TreeYears = max(branching.times(phy)) * 1e+06,
  timeStep = NULL,
  checkpointFile = NULL,
  checkpointFreq = 24,
  verbose = TRUE,
  checkTimeStep = TRUE,
  verboseNested = FALSE,
  freevector = NULL,
  taxonDF = NULL,
  giveUpAttempts = 10
)

Arguments

`phy`	A phylogenetic tree, in package `ape`'s `phylo` format.
`intrinsicFn`	Name of (previously-defined) function that governs how traits evolve within a lineage, regardless of the trait values of other taxa.
`extrinsicFn`	Name of (previously-defined) function that governs how traits evolve within a lineage, based on their own ('internal') trait vlaue and the trait values of other taxa.
`startingPriorsFns`	Vector containing names of prior distributions to use for root states: can be one of `"fixed"`, `"uniform"`, `"normal"`, `"lognormal"`, `"gamma"`, `"exponential"`.
`startingPriorsValues`	A list of the same length as the number of prior distributions specified in `startingPriorsFns` (for starting values, this should be one prior function specified for each trait - thus one for most univariate trait analyses), with each element of the list a vector the same length as the appropriate number of parameters for that prior distribution (1 for `"fixed"`, 2 for `"uniform"`, 2 for `"normal"`, 2 for `"lognormal"`, 2 for `"gamma"`, 1 for `"exponential"`).
`intrinsicPriorsFns`	Vector containing names of prior distributions to use for intrinsic function parameters: can be one of `"fixed"`, `"uniform"`, `"normal"`, `"lognormal"`, `"gamma"`, `"exponential"`.
`intrinsicPriorsValues`	A list of the same length as the number of prior distributions specified in `intrinsicPriorsFns` (one prior function specified for each parameter in the intrinsic model), with each element of the list a vector the same length# as the appropriate number of parameters for that prior distribution (1 for `"fixed"`, 2 for `"uniform"`, 2 for `"normal"`, 2 for `"lognormal"`, 2 for `"gamma"`, 1 for `"exponential"`).
`extrinsicPriorsFns`	Vector containing names of prior distributions to use for extrinsic function parameters: can be one of `"fixed"`, `"uniform"`, `"normal"`, `"lognormal"`, `"gamma"`, `"exponential"`.
`extrinsicPriorsValues`	A list of the same length as the number of prior distributions specified in `extrinsicPriorsFns` (one prior function specified for each parameter in the extrinsic model), with each element of the list a vector the same length as the appropriate number of parameters for that prior distribution (1 for `"fixed"`, 2 for `"uniform"`, 2 for `"normal"`, 2 for `"lognormal"`, 2 for `"gamma"`, 1 for `"exponential"`).
`generation.time`	The number of years per generation. This sets the coarseness of the simulation; if it's set to 1000, for example, the population's trait values change every 1000 calendar years. Note that this is in calendar years (see description for argument `TreeYears`), and not in millions of years (as is typical for dated trees in macroevolutionary studies). Thus, if a branch is 1 million-year time-unit long, and a user applies the default `generation.time = 1000`, then 1000 evolutionary changes will be simulated along that branch. See documentation for `doSimulation` for further details.
`TreeYears`	The amount of calendar time from the root to the furthest tip. Most trees in macroevolutionary studies are dated with branch lengths in units of millions of years, and thus the default for this argument is `max(branching.times(phy)) * 1e6`. If your tree has the most recent tip at time zero (i.e., the modern day), this would be the same as the root age of the tree. If your branch lengths are not in millions of years, you should alter this argument. Otherwise, leave this argument alone. See documentation for `doSimulation` for further details.
`timeStep`	This value corresponds to the length of intervals between discrete evolutionary events ('generations') simulated along branches, relative to a rescaled tree where the root to furthest tip distance is 1. For example, `timeStep = 0.01` of would mean 100 (i.e., 1 / 0.01) evolutionary changes would be expected to occur from the root to the furthest tip. (Note that the real number simulated will be much less, because simulations start over at each branching node.) Ideally, `timeStep` (or its effective value, via other arguments) should be as short as is computationally possible. Typically `NULL` by default and determined internally as follows: `timeStep = generation.time / TreeYears`. Can be provided a value as an alternative to using arguments `generation.time` and `TreeYears`, which would then be overridden. See documentation for `doSimulation` for further details.
`giveUpAttempts`	Value for when to stop the analysis if `NA` values are present.
`verbose`	If `TRUE`, gives messages about how the simulation is progessing via `message`.
`checks`	If `TRUE`, checks inputs for consistency. This activity is skipped (`checks = FALSE`) when run in parallel by `parallelSimulateWithPriors`, and instead is only checked once. This argument also controls whether `simulateWithPriors` assigns `freevector` as an attribute to the output produced.
`taxonDF`	A data.frame containing data on nodes (both tips and internal nodes) output by various internal functions. Can be supplied as input to spead up repeated calculations, but by default is `NULL`, which instead forces a calculation from input `phy`.
`freevector`	A logical vector (with length equal to the number of parameters), indicating free (`TRUE`) and fixed (`FALSE`) parameters.
`nrepSim`	Number of replicated simulations to run.
`multicore`	Whether to use multicore, default is `FALSE`. If `TRUE`, one of two suggested packages must be installed, either `doMC` (for UNIX systems) or `doParallel` (for Windows), which are used to activate multithreading. If neither package is installed, this function will fail if `multicore = TRUE`.
`coreLimit`	Maximum number of cores to be used.
`checkpointFile`	Optional file name for checkpointing simulations
`checkpointFreq`	Saving frequency for checkpointing
`checkTimeStep`	If `TRUE`, warnings will be issued if `TimeStep` is too short.
`verboseNested`	Should looped runs of `simulateWithPriors` be verbose?

Value

Function simulateWithPriors returns a vector of trueFreeValues, the true generating parameters used in the simulation (a set of values as long as the number of freely varying parameters), concatenated with a set of summary statistics for the simulation.

Function parallelSimulateWithPriors returns a matrix of such vectors bound together, with each row representing a different simulation.

By default, both functions also assign a logical vector named freevector, indicating the total number of parameters and which parameters are freely-varying (have TRUE values), as an attribute of the output.

Note

The simulateWithPriors functions are effectively the engine that powers the doRun functions, while the doSimulation function is the pistons within the simulateWithPriors engine. In general, most users will just drive the car - they will just use doRun, but some users may want to use simulateWithPriors or doSimulation to do various simulations.

Author(s)

Brian O'Meara and Barb Banbury

Examples


set.seed(1)
tree <- rcoal(20)
# get realistic edge lengths
tree$edge.length <- tree$edge.length*20

# example simulation

# NOTE: the example analyses involve too few simulations,
    # as well as overly coarse time-units...
    # ...all for the sake of examples that reasonably test the functions
    
simData <- simulateWithPriors(
  phy = tree, 
  intrinsicFn = brownianIntrinsic, 
  extrinsicFn = nullExtrinsic, 
  startingPriorsFns = "normal", 
  startingPriorsValues = list(
      c(mean(simCharExample[, 1]), sd(simCharExample[, 1]))), 
  intrinsicPriorsFns = c("exponential"), 
  intrinsicPriorsValues = list(10), 
  extrinsicPriorsFns = c("fixed"), 
  extrinsicPriorsValues = list(0), 
  generation.time = 100000, 
  freevector = NULL,     
  giveUpAttempts = 10, 
  verbose = TRUE)

simData

simDataParallel <- parallelSimulateWithPriors(
  nrepSim = 2, 
  multicore = FALSE, 
  coreLimit = 1, 
  phy = tree, 
  intrinsicFn = brownianIntrinsic, 
  extrinsicFn = nullExtrinsic, 
  startingPriorsFns = "normal", 
  startingPriorsValues = list(
     c(mean(simCharExample[, 1]), sd(simCharExample[, 1]))), 
  intrinsicPriorsFns = c("exponential"), 
  intrinsicPriorsValues = list(10), 
  extrinsicPriorsFns = c("fixed"), 
  extrinsicPriorsValues = list(0), 
  generation.time = 100000, 
  checkpointFile = NULL, checkpointFreq = 24, 
  verbose = TRUE, 
  freevector = NULL, 
  taxonDF = NULL)

simDataParallel

bomeara/treevo documentation built on Aug. 19, 2023, 6:52 p.m.