doRun | R Documentation |
The doRun
functions are the main interface for
TreEvo
users to do Approximate Bayesian Computation (ABC) analysis,
effectively wrapping the simulateWithPriors
functions
to perform simulations, which themselves are wrappers for
the doSimulation
function. The two current doRun
functions are doRun_prc
which applies a partial
rejection control (PRC) ABC analysis over multiple generations of simulations,
and doRun_rej
which performs a full rejection ('rej') ABC analysis.
doRun_prc(
phy,
traits,
intrinsicFn,
extrinsicFn,
startingPriorsValues,
startingPriorsFns,
intrinsicPriorsValues,
intrinsicPriorsFns,
extrinsicPriorsValues,
extrinsicPriorsFns,
numParticles = 300,
nStepsPRC = 5,
nRuns = 2,
nInitialSims = NULL,
nInitialSimsPerParam = 100,
generation.time = 1000,
TreeYears = max(branching.times(phy)) * 1e+06,
multicore = FALSE,
coreLimit = NA,
multicoreSuppress = FALSE,
standardDevFactor = 0.2,
epsilonProportion = 0.7,
epsilonMultiplier = 0.7,
validation = "CV",
scale = TRUE,
variance.cutoff = 95,
stopRule = FALSE,
stopValue = 0.05,
maxAttempts = Inf,
diagnosticPRCmode = FALSE,
jobName = NA,
saveData = FALSE,
verboseParticles = TRUE
)
doRun_rej(
phy,
traits,
intrinsicFn,
extrinsicFn,
startingPriorsValues,
startingPriorsFns,
intrinsicPriorsValues,
intrinsicPriorsFns,
extrinsicPriorsValues,
extrinsicPriorsFns,
generation.time = 1000,
TreeYears = max(branching.times(phy)) * 1e+06,
multicore = FALSE,
coreLimit = NA,
validation = "CV",
scale = TRUE,
nInitialSims = NULL,
nInitialSimsPerParam = 100,
variance.cutoff = 95,
standardDevFactor = 0.2,
jobName = NA,
abcTolerance = 0.1,
checkpointFile = NULL,
checkpointFreq = 24,
savesims = FALSE
)
phy |
A phylogenetic tree, in package |
traits |
Data matrix with rownames identical to |
intrinsicFn |
Name of (previously-defined) function that governs how traits evolve within a lineage, regardless of the trait values of other taxa. |
extrinsicFn |
Name of (previously-defined) function that governs how traits evolve within a lineage, based on their own ('internal') trait vlaue and the trait values of other taxa. |
startingPriorsValues |
A list of the same length as
the number of prior distributions specified in
|
startingPriorsFns |
Vector containing names of prior distributions to
use for root states: can be one of
|
intrinsicPriorsValues |
A list of the same length
as the number of prior distributions specified
in |
intrinsicPriorsFns |
Vector containing names of prior distributions to
use for intrinsic function parameters: can be one of
|
extrinsicPriorsValues |
A list of the same length as
the number of prior distributions specified in |
extrinsicPriorsFns |
Vector containing names of prior distributions to
use for extrinsic function parameters: can be one of
|
numParticles |
Number of accepted particles per PRC generation. |
nStepsPRC |
Number of PRC generations to run. |
nRuns |
Number of independent PRC runs to be performed,
each consisting of independent sets of
initial simulations and PRC generations. Note that runs are
run sequentially, and not in parallel,
as the generation of particles within each run
makes use of the multicore functionality.
If |
nInitialSims |
Number of initial simulations used to calibrate particle rejection control algorithm. If not given, this will be a function of the number of freely varying parameters. |
nInitialSimsPerParam |
If |
generation.time |
The number of years per generation.
This sets the coarseness of the simulation; if it's set to 1000,
for example, the population's trait values change every 1000
calendar years. Note that this is in calendar years (see description
for argument |
TreeYears |
The amount of calendar time from the root
to the furthest tip. Most trees in macroevolutionary studies are dated with
branch lengths in units of millions of years, and thus the
default for this argument is |
multicore |
Whether to use multicore, default is |
coreLimit |
Maximum number of cores to be used. |
multicoreSuppress |
This argument suppresses the
multicore code and will apply a plain vanilla serial |
standardDevFactor |
Standard deviation for mutating states each time a new particle is generated in a PRC generation. |
epsilonProportion |
Sets tolerance for initial simulations. |
epsilonMultiplier |
Sets tolerance on subsequent PRC generations. |
validation |
Character argument controlling what
validation procedure is used by |
scale |
This argument is passed to |
variance.cutoff |
Minimum threshold percentage of variance explained for the number of components included in the final PLS model fit. This value is a percentage and must be between 0 and 100. Default is 95 percent. |
stopRule |
If |
stopValue |
Threshold value for terminating an analysis prior to
|
maxAttempts |
Maximum number attempts made
in |
diagnosticPRCmode |
If |
jobName |
Optional job name. |
saveData |
Option to save various run information during the analysis, including summary statistics from analyses, output to external .Rdata and .txt files. |
verboseParticles |
If |
abcTolerance |
Proportion of accepted simulations. |
checkpointFile |
Optional file name for checkpointing simulations. |
checkpointFreq |
Saving frequency for checkpointing. |
savesims |
Option to save individual simulations, output to a .Rdata file. |
Both doRun
functions take an input phylogeny (phy
),
observed trait data set (traits
), models
(intrinsicFn
, extrinsicFn
), and priors
(startingPriorsValues
, startingPriorsFns
,
intrinsicPriorsValues
,
intrinsicPriorsFns
, extrinsicPriorsValues
,
extrinsicPriorsFns
). Pulling from the priors, it
simulates an initial set of simulations (nInitialSims
).
This set of simulations is boxcox transformed , and a PLS regression (see
methodsPLS
) is performed for each free parameter
to determine the most informative summary statistics (using
variance.cutoff
). The euclidean distance is calculated
between each each initial simulation's most informative summary
statistics and the input observed data.
Following that step, the approach of the two functions diverge drastically.
For function doRun_rej
, those simulations whose
most informative summary statistics fall below
abcTolerance
are kept as accepted 'particles'
(simulations runs), describing the
posterior distribution of parameters. No additional
generations of simulations are performed.
Coversely, function doRun_prc
performs an ABC-PRC analysis,
which follows a much more complicated algorithm with additional
generations. In PRC, tolerance is set based on
epsilonProportion
and epsilonMultiplier
, and the analysis
begins with generation 1 and continues through a
number of generations equal to nStepsPRC
.
Single simulation runs ('particles') are accepted if the distance of the
most informative summary stats to the original data
is less than this tolerance. A generation
is complete when enough particles (numParticles
) have been accepted. These
particles make up the distribution for the next generation. The accepted
particles from the final generation describe the posterior distributions of
parameters.
The output of these two functions are lists,
composed of multiple objects,
which differ slightly in their content among the
two functions. For doRun_prc
, the output is:
Input variables:
jobName
, nTaxa
(number of taxa),
nInitialSims
, generation.time
, TreeYears
,
timeStep
, totalGenerations
,
epsilonProportion
, epsilonMultiplier
,
nRuns
(number of runs), nStepsPRC
,
numParticles
, standardDevFactor
.
List of prior distributions. This is used
for doing post-analysis comparisons between prior and
posterior distributions, such as with function plotPosteriors
.
DataFrame with information from each
simulation, including generation, attempt, id
, parentid
,
distance, weight, and parameter states.
Final tolerance vector.
Input phylogeny.
Input traits.
Processor time for initial simulations.
Processor time for subsequent generations.
Summarizes the posterior distribution from the final generation for all free parameters, giving the mean, standard deviation and Highest Posterior Density (at a 0.8 alpha) for each parameter.
If nRuns
is greater than 1, the output from
doRun_prc
will be a list object composed of
multiple output lists, as described.
For doRun_rej
, the output is:
Input variables:
jobName
, nTaxa
(number of taxa),
nInitialSims
, generation.time
, TreeYears
,
timeStep
, totalGenerations
,
epsilonProportion
, epsilonMultiplier
,
nRuns
(number of runs), nStepsPRC
,
numParticles
, standardDevFactor
.
List of prior distributions. This is
used for doing post-analysis comparisons between prior
and posterior distributions, such as
with function plotPosteriors
.
Input phylogeny.
Input traits.
Parameter estimates and summary stats from all sims.
Processor time for simulations.
Euclidean distances for each simulation and free parameter.
DataFrame with information from each simulation, including generation, attempt, id, parentid, distance, weight, and parameter states.
Summarizes the posterior distribution from the final generation for all free parameters, giving the mean, standard deviation and Highest Posterior Density (at a 0.8 alpha) for each parameter.
A list of parameter means for both fixed and unfixed parameters, sorted into a list of three vectors (starting, intrinsic, extrinsic).
Brian O'Meara and Barb Banbury
Sisson et al. 2007, Wegmann et al. 2009
set.seed(1)
data(simRunExample)
# NOTE: the example analyses below sample too few particles,
# over too few steps, with too few starting simulations
# - all for the sake of examples that demonstrate these
# within a reasonable time-frame
## Please set these values to more realistic levels for your analyses!
resultsPRC <- doRun_prc(
phy = simPhyExample,
traits = simCharExample,
intrinsicFn = brownianIntrinsic,
extrinsicFn = nullExtrinsic,
startingPriorsFns = "normal",
startingPriorsValues = list(c(mean(simCharExample[, 1]),
sd(simCharExample[, 1]))),
intrinsicPriorsFns = c("exponential"),
intrinsicPriorsValues = list(10),
extrinsicPriorsFns = c("fixed"),
extrinsicPriorsValues = list(0),
generation.time = 10000,
nRuns = 2,
nStepsPRC = 3,
numParticles = 20,
nInitialSimsPerParam = 10,
jobName = "examplerun_prc",
stopRule = FALSE,
multicore = FALSE,
coreLimit = 1
)
resultsPRC
#one should make sure priors are uniform with doRun_rej!
resultsRej <- doRun_rej(
phy = simPhyExample,
traits = simCharExample,
intrinsicFn = brownianIntrinsic,
extrinsicFn = nullExtrinsic,
startingPriorsFns = "normal",
startingPriorsValues = list(c(mean(simCharExample[, 1]),
sd(simCharExample[, 1]))),
intrinsicPriorsFns = c("exponential"),
intrinsicPriorsValues = list(10),
extrinsicPriorsFns = c("fixed"),
extrinsicPriorsValues = list(0),
generation.time = 10000,
jobName = "examplerun_rej",
abcTolerance = 0.05,
multicore = FALSE,
coreLimit = 1
)
resultsRej
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.