simulateWithPriors: Simulate data for initial TreEvo analysis

View source: R/simulateWithPriors.R

simulateWithPriorsR Documentation

Simulate data for initial TreEvo analysis

Description

The simulateWithPriors function pulls parameters from prior distributions and conducts a single simulation of continuous trait evolution (using the doSimulation function), returning useful summary statistics for ABC. parallelSimulateWithPriors is a wrapper function for simulateWithPriors that allows for multithreading and checkpointing. This family of functions is mostly used as internal components, generating simulations within ABC analyses using the doRun functions. See Note below.

Usage

simulateWithPriors(
  phy = NULL,
  intrinsicFn,
  extrinsicFn,
  startingPriorsFns,
  startingPriorsValues,
  intrinsicPriorsFns,
  intrinsicPriorsValues,
  extrinsicPriorsFns,
  extrinsicPriorsValues,
  generation.time = 1000,
  TreeYears = max(branching.times(phy)) * 1e+06,
  timeStep = NULL,
  giveUpAttempts = 10,
  verbose = FALSE,
  checks = TRUE,
  taxonDF = NULL,
  freevector = NULL
)

parallelSimulateWithPriors(
  nrepSim,
  multicore,
  coreLimit,
  phy,
  intrinsicFn,
  extrinsicFn,
  startingPriorsFns,
  startingPriorsValues,
  intrinsicPriorsFns,
  intrinsicPriorsValues,
  extrinsicPriorsFns,
  extrinsicPriorsValues,
  generation.time = 1000,
  TreeYears = max(branching.times(phy)) * 1e+06,
  timeStep = NULL,
  checkpointFile = NULL,
  checkpointFreq = 24,
  verbose = TRUE,
  checkTimeStep = TRUE,
  verboseNested = FALSE,
  freevector = NULL,
  taxonDF = NULL,
  giveUpAttempts = 10
)

Arguments

phy

A phylogenetic tree, in package ape's phylo format.

intrinsicFn

Name of (previously-defined) function that governs how traits evolve within a lineage, regardless of the trait values of other taxa.

extrinsicFn

Name of (previously-defined) function that governs how traits evolve within a lineage, based on their own ('internal') trait vlaue and the trait values of other taxa.

startingPriorsFns

Vector containing names of prior distributions to use for root states: can be one of "fixed", "uniform", "normal", "lognormal", "gamma", "exponential".

startingPriorsValues

A list of the same length as the number of prior distributions specified in startingPriorsFns (for starting values, this should be one prior function specified for each trait - thus one for most univariate trait analyses), with each element of the list a vector the same length as the appropriate number of parameters for that prior distribution (1 for "fixed", 2 for "uniform", 2 for "normal", 2 for "lognormal", 2 for "gamma", 1 for "exponential").

intrinsicPriorsFns

Vector containing names of prior distributions to use for intrinsic function parameters: can be one of "fixed", "uniform", "normal", "lognormal", "gamma", "exponential".

intrinsicPriorsValues

A list of the same length as the number of prior distributions specified in intrinsicPriorsFns (one prior function specified for each parameter in the intrinsic model), with each element of the list a vector the same length# as the appropriate number of parameters for that prior distribution (1 for "fixed", 2 for "uniform", 2 for "normal", 2 for "lognormal", 2 for "gamma", 1 for "exponential").

extrinsicPriorsFns

Vector containing names of prior distributions to use for extrinsic function parameters: can be one of "fixed", "uniform", "normal", "lognormal", "gamma", "exponential".

extrinsicPriorsValues

A list of the same length as the number of prior distributions specified in extrinsicPriorsFns (one prior function specified for each parameter in the extrinsic model), with each element of the list a vector the same length as the appropriate number of parameters for that prior distribution (1 for "fixed", 2 for "uniform", 2 for "normal", 2 for "lognormal", 2 for "gamma", 1 for "exponential").

generation.time

The number of years per generation. This sets the coarseness of the simulation; if it's set to 1000, for example, the population's trait values change every 1000 calendar years. Note that this is in calendar years (see description for argument TreeYears), and not in millions of years (as is typical for dated trees in macroevolutionary studies). Thus, if a branch is 1 million-year time-unit long, and a user applies the default generation.time = 1000, then 1000 evolutionary changes will be simulated along that branch. See documentation for doSimulation for further details.

TreeYears

The amount of calendar time from the root to the furthest tip. Most trees in macroevolutionary studies are dated with branch lengths in units of millions of years, and thus the default for this argument is max(branching.times(phy)) * 1e6. If your tree has the most recent tip at time zero (i.e., the modern day), this would be the same as the root age of the tree. If your branch lengths are not in millions of years, you should alter this argument. Otherwise, leave this argument alone. See documentation for doSimulation for further details.

timeStep

This value corresponds to the length of intervals between discrete evolutionary events ('generations') simulated along branches, relative to a rescaled tree where the root to furthest tip distance is 1. For example, timeStep = 0.01 of would mean 100 (i.e., 1 / 0.01) evolutionary changes would be expected to occur from the root to the furthest tip. (Note that the real number simulated will be much less, because simulations start over at each branching node.) Ideally, timeStep (or its effective value, via other arguments) should be as short as is computationally possible. Typically NULL by default and determined internally as follows: timeStep = generation.time / TreeYears. Can be provided a value as an alternative to using arguments generation.time and TreeYears, which would then be overridden. See documentation for doSimulation for further details.

giveUpAttempts

Value for when to stop the analysis if NA values are present.

verbose

If TRUE, gives messages about how the simulation is progessing via message.

checks

If TRUE, checks inputs for consistency. This activity is skipped (checks = FALSE) when run in parallel by parallelSimulateWithPriors, and instead is only checked once. This argument also controls whether simulateWithPriors assigns freevector as an attribute to the output produced.

taxonDF

A data.frame containing data on nodes (both tips and internal nodes) output by various internal functions. Can be supplied as input to spead up repeated calculations, but by default is NULL, which instead forces a calculation from input phy.

freevector

A logical vector (with length equal to the number of parameters), indicating free (TRUE) and fixed (FALSE) parameters.

nrepSim

Number of replicated simulations to run.

multicore

Whether to use multicore, default is FALSE. If TRUE, one of two suggested packages must be installed, either doMC (for UNIX systems) or doParallel (for Windows), which are used to activate multithreading. If neither package is installed, this function will fail if multicore = TRUE.

coreLimit

Maximum number of cores to be used.

checkpointFile

Optional file name for checkpointing simulations

checkpointFreq

Saving frequency for checkpointing

checkTimeStep

If TRUE, warnings will be issued if TimeStep is too short.

verboseNested

Should looped runs of simulateWithPriors be verbose?

Value

Function simulateWithPriors returns a vector of trueFreeValues, the true generating parameters used in the simulation (a set of values as long as the number of freely varying parameters), concatenated with a set of summary statistics for the simulation.

Function parallelSimulateWithPriors returns a matrix of such vectors bound together, with each row representing a different simulation.

By default, both functions also assign a logical vector named freevector, indicating the total number of parameters and which parameters are freely-varying (have TRUE values), as an attribute of the output.

Note

The simulateWithPriors functions are effectively the engine that powers the doRun functions, while the doSimulation function is the pistons within the simulateWithPriors engine. In general, most users will just drive the car - they will just use doRun, but some users may want to use simulateWithPriors or doSimulation to do various simulations.

Author(s)

Brian O'Meara and Barb Banbury

Examples


set.seed(1)
tree <- rcoal(20)
# get realistic edge lengths
tree$edge.length <- tree$edge.length*20

# example simulation

# NOTE: the example analyses involve too few simulations,
    # as well as overly coarse time-units...
    # ...all for the sake of examples that reasonably test the functions
    
simData <- simulateWithPriors(
  phy = tree, 
  intrinsicFn = brownianIntrinsic, 
  extrinsicFn = nullExtrinsic, 
  startingPriorsFns = "normal", 
  startingPriorsValues = list(
      c(mean(simCharExample[, 1]), sd(simCharExample[, 1]))), 
  intrinsicPriorsFns = c("exponential"), 
  intrinsicPriorsValues = list(10), 
  extrinsicPriorsFns = c("fixed"), 
  extrinsicPriorsValues = list(0), 
  generation.time = 100000, 
  freevector = NULL,     
  giveUpAttempts = 10, 
  verbose = TRUE)

simData

simDataParallel <- parallelSimulateWithPriors(
  nrepSim = 2, 
  multicore = FALSE, 
  coreLimit = 1, 
  phy = tree, 
  intrinsicFn = brownianIntrinsic, 
  extrinsicFn = nullExtrinsic, 
  startingPriorsFns = "normal", 
  startingPriorsValues = list(
     c(mean(simCharExample[, 1]), sd(simCharExample[, 1]))), 
  intrinsicPriorsFns = c("exponential"), 
  intrinsicPriorsValues = list(10), 
  extrinsicPriorsFns = c("fixed"), 
  extrinsicPriorsValues = list(0), 
  generation.time = 100000, 
  checkpointFile = NULL, checkpointFreq = 24, 
  verbose = TRUE, 
  freevector = NULL, 
  taxonDF = NULL)

simDataParallel




bomeara/treevo documentation built on Aug. 19, 2023, 6:52 p.m.