R/simFossilRecord.R

Defines functions simFossilRecord

Documented in simFossilRecord

#' Full-Scale Simulations of the Fossil Record with Birth, Death and Sampling of Morphotaxa
#' 
#' A complete birth-death-sampling branching simulator
#' that captures morphological-taxon identity
#' of lineages, as is typically discussed in models of paleontological data. This function
#' allows for the use of precise point constraints to condition simulation run acceptance and
#' can interpret complex character strings given as rate values for use in modeling
#' complex processes of diversification and sampling.

#' @details
#' \code{simFossilRecord} simulates a birth-death-sampling branching process
#' (ala Foote, 1997,  2000; Stadler, 2009) in which lineages of organisms may branch,
#' go extinct or be sampled at discrete points within a continuous time-interval.
#' The occurrence of these discrete events are modeled as stochastic
#' Poisson process, described by some set of instantaneous rates.
#' This model is ultimately based on the birth-death model (Kendall, 1948; Nee, 2006),
#' which is widely implemented in many R packages. Unlike other such typical branching
#' simulators, this function enmeshes the lineage units within explicit models of how
#' lineages are morphologically differentiated (Bapst, 2013). 
#' This is key to allow comparison to datasets from the fossil record,
#' as morphotaxa are the basic units of paleontological diversity estimates
#' and phylogenetic analyses. 
#' 
#' \emph{Models of Morphological Differentiation and Branching (Cladogenesis and Anagenesis)}
#' 
#' These models of morphological differentiation do not involve the direct simulation of
#' morphological traits. Instead, morphotaxon identity is used as a proxy of the
#' distinctiveness of lineages on morphological grounds, as if there was some hypothetical
#' paleontologist attempting to taxonomically sort collections of specimens of these simulated
#' lineages. Two lineages are either identical, and thus share the same morphotaxon identity,
#' or they are distinct, and thus have separate morphotaxon identities. Morphological
#' differentiation is assumed to be an instantaneous process for the purposes of this model,
#' such that no intermediate could be uncovered.
#' 
#' Specifically, \code{simFossilRecord} allows for three types of
#' binary branching events, referred to here as under the umbrella
#' term of 'cladogenesis': 'budding cladogenesis',
#' 'bifurcating cladogenesis', and 'cryptic cladogenesis',
#' as well as for a fourth non-branching event-type, 'anagenesis'.
#' See Wagner and Erwin, 1995; Foote, 1996; and Bapst, 2013, for further details.
#' Budding, bifurcation and cryptic cladogenetic events all
#' share in common that a single geneological
#' lineage splits into two descendant lineages, but
#' differ in the morphological differentiation
#' of these child lineages relative to their parent.
#' Under budding cladogenesis, only one of the
#' child lineages becomes morphologically distinguishable
#' from the parent, and thus the ancestral
#' morphotaxon persists through the branching event
#' as the child lineage that does \emph{not}
#' differentiate. Under bifurcating cladogenesis, both child lineages
#' become immediately distinct from the ancestor,
#' and thus two new morphotaxa appear while the
#' ancestor terminates in an event known as 'pseudoextinction'.
#' Cryptic cladogenesis has no morphological differentiation:
#' both child lineages are presumed to be indistinct from
#' the ancestor and from each other, which means a hypothetical paleontologist
#' would not observe that branching had occurred at all.
#' Anagenesis is morphological differentiation independent of
#' any branching, such that a morphotaxon instantaneously transitions to a
#' new morphotaxon identity, resulting in the pseudoextinction of
#' the ancestral morphotaxon and the immediate 'pseudospeciation'
#' of the child morphotaxon.
#' In anagenesis, the ancestral morphotaxon and descendant morphotaxon
#' do not overlap in time at all, as modeled here
#' (contra to the models described by Ezard et al., 2012). 
#' For ease of following these cryptic lineages, cryptic cladogenetic
#' events are treated in terms of data structure similarly to
#' budding cladogenetic events, with one child lineage treated
#' as a persistence of the ancestral lineage, and the other
#' as a new morphologically indistinguishable lineage.
#' This model of cryptic cladogenesis is ultimately
#' based on the hierarchical birth-death model used by
#' many authors for modeling patterns across paraphyletic
#' higher taxa and the lower taxon units within them
#' (e.g. Patzkowsky, 1995; Foote, 2012).
#' 
#' The occurrence of the various models is controlled by
#' multiple arguments of \code{simFossilRecord}.
#' The overall instantaneous rate of branching (cladogenesis) is
#' controlled by argument \code{p}, and the proportion of
#' each type of cladogenesis controlled by arguments
#' \code{prop.bifurc} and \code{prop.cryptic}.
#' \code{prop.cryptic} controls the overall probability that
#' any branching event will be cryptic versus
#' involving any morphological differentiation (budding or bifurcating).
#' If \code{prop.cryptic = 1}, all branching events will be cryptic cladogenesis,
#' and if \code{prop.cryptic = 0}, all branching events will
#' involve morphological differentiation and none will be cryptic.
#' \code{prop.bifurc} controls how many branching events that
#' involve morphological differentiation (i.e. the inverse of \code{prop.cryptic})
#' are bifurcating, as opposed to budding cladogenesis. 
#' If \code{prop.bifurc = 1}, all morphologically-differentiating branching events will
#' be bifurcating cladogenesis, and if \code{prop.bifurc = 0},
#' all morphologically-differentiating branching events will be budding cladogenesis.
#' Thus, for example, the probability of a given cladogenesis event
#' being budding is given by:
#' 
#'  \code{Prob(budding cladogenesis at a branching event) = (1 - prop.cryptic) * (1 - prop.bifurc)}
#' 
#' By default, \code{prop.cryptic = 0} and \code{prop.bifurc = 0},
#' so all branching events will be instances of budding cladogenesis
#' in analyses that use default setting.
#' Anagenesis is completely independent of these, controlled as its
#' own Poisson process with an instantaneous rated
#' defined by the argument \code{anag.rate}.
#' By default, this rate is set to zero and thus there is no
#' anagenetic events without user intervention.
#' 
#' \emph{Stopping Conditions and Acceptance Criteria for Simulations}
#' 
#' How forward-time simulations are generated, halted and whether they are accepted
#' or not for output is a critical component of simulation design.
#' Most uses of \code{simFossilRecord} will involve iteratively
#' generating and analyzing multiple simulation runs. Runs are only
#' accepted for output if they meet the conditioning criteria defined
#' in the arguments, either matching point constraints or falling
#' within range constraints. However, this requires separating the processes of
#' halting simulation runs and accepting a run for output, particularly to avoid bias
#' related to statistical sampling issues.
#' 
#' Hartmann et al. (2011) recently discovered a potential statistical artifact
#' when branching simulations are conditioned on some number of taxa.
#' Previously within \code{paleotree}, this was accounted for in
#' the deprecated function \code{simFossilTaxa} by a
#' complex arrangement of minimum and maximum constraints,
#' and an (incorrect) presumption that allowing simulations to
#' continue for a short distance after constraints were reached
#' would solve this statistical artifact. This strategy is not applied here.
#' Instead, \code{simFossilRecord} applies the General Sampling Algorithm presented
#' by Hartmann et al. (or at least, a close variant). A simulation continues until
#' extinction or some maximum time-constraint is reached, evaluated for intervals
#' that match the set run conditions (e.g. \code{nExtant}, \code{nTotalTime}) and, if some
#' interval or set of intervals matches the run conditions, a date is randomly sampled
#' from within this interval/intervals. The simulation is then cut at this date using
#' the \code{timeSliceFossilRecord} function, and saved as an accepted run.
#' The simulation data is otherwise discarded and then a new simulation initiated
#' (therefore, at most, only one simulated dataset is accepted from one simulation run).
#' 
#' Thus, accepted simulations runs should reflect unbiased samples of evolutionary
#' histories that precisely match the input constraints, which can be very precise,
#' unlike how stopping and acceptance conditions were handled in the previous (deprecated)
#' \code{simFossilTaxa} function. Of course, selecting very precise constraints that
#' are very unlikely or impossible given other model parameters may take considerable
#' computation time to find acceptable simulation runs, or effectively never find any
#' acceptable simulation runs.
#' 
#' \emph{On Time-Scale Used in Output}
#' 
#' Dates given in the output are on an reversed absolute time-scale; i.e. time
#' decreases going from the past to the future, as is typical in paleontological
#' uses of time (as time before present) and as for most function in package
#' \code{paleotree}. The endpoints of the time-scale are decided by details of the
#' simulation and can be modified by several arguments.
#' By default (with \code{shiftRoot4TimeSlice  = } \code{"withExtantOnly"}), 
#' any simulation run that is accepted with extant taxa will have zero as the
#' \emph{end-time} (i.e. when those taxa are extant),
#' as zero is the typical time assigned to the modern day in empirical studies.
#' If a simulation ends with all taxa extinct, however, then instead the \emph{start-time}
#' of a run (i.e. when the run initiates with starting taxa) will be maximum value
#' assigned to the conditioning argument \code{totalTime}. 
#' If \code{shiftRoot4TimeSlice  = } \code{FALSE}, then the
#' \emph{start-time} of the run will always be this maximum value for
#' \code{totalTime}, and any extant taxa will stop at some time greater than zero.

#' @param p,q,r,anag.rate These parameters control the instantaneous
#' ('per-capita') rates of branching, extinction,
#' sampling and anagenesis, respectively. These can be given as a number 
#' equal to or greater than zero, or as a 
#' character string which will be interpreted as an algebraic equation. 
#' These equations can make use of three
#' quantities which will/may change throughout the simulation: 
#' the standing richness is \code{N},
#' the current time passed since the start of the simulation is \code{T}, 
#' the present duration of a given still-living lineage
#' since its origination time is code{D},
#' and the current branching rate is \code{P}
#' (corresponding to the argument name \code{p}).
#' Note that \code{P} cannot be used in equations for the branching rate itself;
#' it is for making other rates relative to the branching rate.
#' 
#' By default, the rates \code{r} and \code{anag.rate} are set to zero,
#' so that the default simulator is a birth-death simulator.
#' Rates set to \code{ =  Inf} are treated as if 0. When a rate is set
#' to 0, this event type will not occur in the simulation.
#' Setting certain processes to zero, like sampling, may increase 
#' simulation efficiency, if the goal is a birth-death or
#' pure-birth model.
#' See documentation for argument \code{negRatesAsZero} about the 
#' treatment of rates that decrease below zero.
#' Notation of branching, extinction and sampling rates as \code{p, q, r} 
#' follows what is typical for the paleobiology literature
#' (e.g. Foote, 1997), not the  Greek letters \code{lambda, mu, phi}
#' found more typically in the biological literature (e.g. Stadler, 2009;
#' Heath et al., 2014; Gavryushkina et al., 2014).

#' @param totalTime,nTotalTaxa,nExtant,nSamp These arguments represent
#' stopping and acceptance conditions for simulation runs. They are
#' respectively \code{totalTime}, the total length of the simulation
#' in time-units, \code{nTotalTaxa}, the total number of taxa over the
#' past evolutionary history of the clade, \code{nExtant}, the total
#' number of extant taxa at the end of the simulation and \code{nSamp}
#' the total number of sampled taxa (not counting extant taxa sampled
#' at the modern day). These are used to determine when to end simulation
#' runs, and whether to accept or reject them as output. They can be input
#' as a vector of two numbers, representing minimum and maximum values of
#' a range for accepted simulation runs (i.e. the simulation length can be
#' between 0 and 1000 time-steps, by default), or as a single number,
#' representing a point condition (i.e. if \code{nSamp = 100} then only those
#' simulation states that contain exactly 100 taxa sampled will be output).
#' Note that it is easy to set combinations of parameters and run conditions that are
#' impossible to produce satisfactory input under, in which case
#' \code{simFossilRecord} would run in a nonstop loop.
#' How cryptic taxa are counted for the sake of these
#' conditions is controlled by argument \code{count.cryptic}.
#' Note that behavior of these constraints can be modified
#' by the argument \code{returnAllRuns}.

#' @param returnAllRuns If \code{TRUE}, all simulation runs will be returned,
#' with the output given as a list composed of two sublists - the first sublist containing
#' all accepted simulations (i.e. everything that would be returned under the
#' default condition of \code{returnAllRuns} \code{= FALSE}), and the second 
#' sublist containing the full history of each failed simulation.
#' These failed simulations are only stopped when they one of four,
#' irreversible 'out-of-bounds' constraints. These four conditions are
#' (a) reaching the maximum total simulation duration (\code{totalTime}),
#' (b) exceeding the maximum number of total taxa (\code{nTotalTaxa}),
#' (c) exceeding the maximum number of sampled taxa (\code{nSamp}), or 
#' (d) total extinction of all lineages in the simulation.

#' @param negRatesAsZero A logical. Should rates calculated as a
#' negative number cause the simulation to fail
#' with an error message (\code{ = FALSE}) or should these be
#' treated as zero (\code{ =  TRUE}, the default). This
#' is equivalent to saying that
#' the \code{rate.as.used  = } \code{max(0, rate.as.given)}.

#' @param prop.cryptic,prop.bifurc These parameters control
#' (respectively) the proportion of branching events that have
#' morphological differentiation, versus those that are cryptic
#' (\code{prop.cryptic}) and the proportion of morphological
#' branching events that are bifurcating, as opposed to budding.
#' Both of these proportions must be a number between 0 and 1.
#' By default, both are set to zero, meaning all branching events
#' are events of budding cladogenesis. See description of
#' the available models of morphological differentiation in the \emph{Description} section.

#' @param tolerance A small number which defines a tiny interval for
#' the sake of placing run-sampling dates before events and
#' for use in determining whether a taxon is extant in \code{simFossilRecordMethods}.

#' @param maxStepTime When rates are time-dependent (i.e. when 
#' parameters 'D' or 'T' are used in equations input for
#' one of the four rate arguments), then protocol used by
#' \code{simFossilRecord} of drawing waiting times to the next
#' event could produce a serious mismatch of resulting process
#' to the defined model, because the possibility of new events
#' is only considered at the end of these waiting times.
#' Instead, any time a waiting time greater than \code{maxStepTime} is
#' selected, then instead \emph{no} event occurs and a
#' time-step equal to \code{maxStepTime} occurs instead,
#' thus effectively discretizing the progression of
#' time in the simulations run by \code{simFossilRecord}.
#' Decreasing this value will increase accuracy
#' (as the time-scale is effectively more discretized)
#' but increase computation time, as the computer will need
#' to stop and check rates to see if an event happened more often.
#' Users should toggle this value relative to the time-dependent
#' rate equations they input, relative to the rate of change in
#' rates expected in time-dependent rates.

#' @param nruns Number of simulation datasets to accept, save and output.
#' If \code{nruns = 1}, output will be a single
#' object of class \code{fossilRecordSimulation}, and
#' if \code{nruns} is greater than 1, a list will be output composed of
#' \code{nruns} objects of class \code{fossilRecordSimulation}. 

#' @param startTaxa Number of initial taxa to begin a simulation with.
#' All will have the simulation start date listed as their time of origination.

#' @param sortNames If \code{TRUE}, output taxonomic lists are sorted by the taxon
#' names (thus sorting cryptic taxa together) rather than by taxon ID number
#' (i.e. the order they were simulated in).

#' @param print.runs If \code{TRUE}, prints the proportion of simulations
#' accepted for output to the terminal.

#' @param maxAttempts Number of simulation attempts allowed before the simulation process
#' is halted with an error message. Default is \code{Inf}.

#' @param plot If \code{TRUE}, plots the diversity curves of accepted simulations,
#' including both the diversity curve of the true number of taxa and the
#' diversity curve for the 'observed' (sampled) number of taxa.

#' @param count.cryptic If \code{TRUE}, cryptic taxa are counted as separate taxa for
#' conditioning limits that count a number of taxon units, such as \code{nTotalTaxa},
#' \code{nExtant} and \code{nSamp}. If \code{FALSE} (the default), then each cryptic
#' complex (i.e. each distinguishable morphotaxon) is treated as a single taxon.
#' See examples.

#' @inheritParams simFossilRecordMethods

#' @return
#' \code{simFossilRecord} returns either a single object of class \code{fossilRecordSimulation}
#' or a list of multiple such objects, depending on whether \code{nruns} was 1 or more.
#' If argument \code{returnAllRuns} \code{= TRUE}, a list composed of two sublists,
#' each of which contains 0 or more \code{fossilRecordSimulation} objects. The
#' first sublist containing all the accepted simulations (i.e. all the simulations that
#' would have been returned if \code{returnAllRuns} was \code{FALSE}), and the second
#' sublist containing the final iteration of all rejected runs before they hit an
#' irreversible out-of-bounds condition (to wit, reaching the maximum \code{totalTime},
#' exceeding the maximum number of total taxa (\code{nTotalTaxa}),
#' exceeding the maximum number of sampled taxa (\code{nSamp}),
#' or total extinction of all lineages in the simulation).
#' 
#' An object of class \code{fossilRecordSimulation} consists
#' of a list object composed of multiple
#' elements, each of which is data for 'one taxon'.
#' Each data element for each taxon is itself
#' a list, composed of two elements: the first describes
#' vital information about the taxon unit, 
#' and the second describes the sampling times of each taxon. 
#' 
#' The first element of the list (named \code{$taxa.data})
#' is a distinctive six-element
#' vector composed of numbers (some are nominally integers,
#' but not all, so all are stored
#' as double-precision integers) with the following field names:
#' 
#' \describe{

#' \item{\code{taxon.id}}{The ID number of this particular taxon-unit.}

#' \item{\code{ancestor.id}}{The ID number of the ancestral taxon-unit.
#' The initial taxa in a simulation will be listed with \code{NA} as their ancestor.}

#' \item{\code{orig.time}}{True time of origination for a taxon-unit in absolute time.}

#' \item{\code{ext.time}}{True time of extinction for a taxon-unit in absolute time.
#' Extant taxa will be listed with an \code{ext.time} of the run-end time of the
#' simulation run, which for simulations with extant taxa is 0 by default (but this
#' may be modified using argument \code{shiftRoot4TimeSlice}).}

#' \item{\code{still.alive}}{Indicates whether a taxon-unit is 'still alive' or not:
#' '1' indicates the taxon-unit is extant, '0' indicates the taxon-unit is extinct}

#' \item{\code{looks.like}}{The ID number of the first morphotaxon in a dataset that
#' 'looks like' this taxon-unit; i.e. belongs to the same multi-lineage cryptic
#' complex. Taxa that are morphologically distinct from any previous lineage will
#' have their \code{taxon.id} match their \code{looks.like}. Thus, this column
#' is rather uninformative unless cryptic cladogenesis occurred in a simulation.}}
#' 
#' The second element for each taxon-unit is a vector of sampling times, creatively
#' named \code{$sampling.times}, with each value representing a data in absolute time
#' when that taxon was sampled in the simulated fossil record. If a taxon was never
#' sampled, this vector is an empty numeric vector of \code{length = 0}.
#' 
#' As is typical for paleontological uses of absolute time, absolute time in these
#' simulations is always decreasing toward the modern; i.e. an absolute date of 50
#' means a point in time which is 50 time-units before the present-day, if the
#' present-day is zero (the default, but see argument \code{shiftRoot4TimeSlice}).
#' 
#' Each individual element of a \code{fossilRecordSimulation} list object
#' is named, generally of the form \code{"t1"} and \code{"t2"}, 
#' where the number is the \code{taxon.id}. 
#' Cryptic taxa are instead named in the form of \code{"t1.2"} and \code{"t5.3"},
#' where the first number is the taxon which they are a
#' cryptic descendant of (\code{looks.like}) and the second number, after the period, is
#' the order of appearance of lineage units in that cryptic complex.
#' For example, for \code{"t5.3"},  the first number is the \code{taxon.id}
#' and the second number communicates that this is the third lineage
#' to appear in this cryptic complex.


#' @seealso
#' \code{\link{simFossilRecordMethods}}
#' 
#' This function essentially replaces and adds to all functionality of the
#' deprecated \code{paleotree} functions \code{simFossilTaxa}, \code{simFossilTaxaSRCond},
#' \code{simPaleoTrees}, as well as the combined used of \code{simFossilTaxa}
#' and \code{sampleRanges} for most models of sampling. 

#' @author 
#' David W. Bapst, inspired by code written by Peter Smits.

#' @references
#' Bapst, D. W. 2013. When Can Clades Be Potentially Resolved with
#' Morphology? \emph{PLoS ONE} 8(4):e62312.
#' 
#' Ezard, T. H. G., P. N. Pearson, T. Aze, and A. Purvis. 2012. The meaning of birth
#' and death (in macroevolutionary birth-death models). \emph{Biology Letters} 8(1):139-142.
#' 
#' Foote, M. 1996 On the Probability of Ancestors in the Fossil
#' Record. \emph{Paleobiology} \bold{22}(2):141--151.
#' 
#' Foote, M. 1997. Estimating Taxonomic Durations and Preservation
#' Probability. \emph{Paleobiology} 23(3):278-300.
#' 
#' Foote, M. 2000. Origination and extinction components of taxonomic diversity:
#' general problems. Pp. 74-102. In D. H. Erwin, and S. L. Wing, eds. \emph{Deep Time:
#' Paleobiology's Perspective.} The Paleontological Society, Lawrence, Kansas.
#' 
#' Foote, M. 2012. Evolutionary dynamics of taxonomic structure. \emph{Biology Letters} 8(1):135-138.
#' 
#' Gavryushkina, A., D. Welch, T. Stadler, and A. J. Drummond. 2014. Bayesian Inference
#' of Sampled Ancestor Trees for Epidemiology and Fossil Calibration. \emph{PLoS Computational Biology}
#' 10(12):e1003919.
#' 
#' Hartmann, K., D. Wong, and T. Stadler. 2010 Sampling Trees from Evolutionary
#' Models. \emph{Systematic Biology} \bold{59}(4):465--476.
#' 
#' Heath, T. A., J. P. Huelsenbeck, and T. Stadler. 2014. The fossilized birth-death process
#' for coherent calibration of divergence-time estimates.
#' \emph{Proceedings of the National Academy of Sciences} 111(29):E2957-E2966.
#' 
#' Kendall, D. G. 1948 On the Generalized "Birth-and-Death" Process. \emph{The
#' Annals of Mathematical Statistics} \bold{19}(1):1--15.
#' 
#' Nee, S. 2006 Birth-Death Models in Macroevolution. \emph{Annual Review of
#' Ecology, Evolution, and Systematics} \bold{37}(1):1--17.
#' 
#' Patzkowsky, M. E. 1995. A Hierarchical Branching Model of Evolutionary Radiations.
#' \emph{Paleobiology} 21(4):440-460.
#' 
#' Solow, A. R., and W. Smith. 1997 On Fossil Preservation and the
#' Stratigraphic Ranges of Taxa. \emph{Paleobiology} \bold{23}(3):271--277.
#' 
#' Stadler, T. 2009. On incomplete sampling under birth-death models and connections to the
#' sampling-based coalescent. \emph{Journal of Theoretical Biology} 261(1):58-66.
#' 
#' Wagner, P. J., and D. H. Erwin. 1995. Phylogenetic patterns as tests of speciation models.
#' Pp. 87-122. In D. H. Erwin, and R. L. Anstey, eds. \emph{New approaches to speciation in the
#' fossil record.} Columbia University Press, New York.

#' @examples
#' 
#' set.seed(2)
#' 
#' # quick birth-death-sampling run
#'     # with 1 run, 50 taxa
#' 
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1, 
#'     nruns = 1,
#'     nTotalTaxa = 50, 
#'     plot = TRUE
#'     )
#'     
#' ################
#' \donttest{
#' 
#' # Now let's examine with multiple runs of simulations
#' 
#' # example of repeated pure birth simulations over 50 time-units
#' records <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0, 
#'     nruns = 10,
#'     totalTime = 50, 
#'     plot = TRUE
#'     )
#' 
#' # plot multiple diversity curves on a log scale
#' records <- lapply(records, 
#'     fossilRecord2fossilTaxa)
#' multiDiv(records,
#'     plotMultCurves = TRUE,
#'     plotLogRich = TRUE
#'     )
#' 
#' # histogram of total number of taxa
#' hist(sapply(records, nrow))
#' 
#' ##############################################
#' # example of repeated birth-death-sampling
#'     # simulations over 50 time-units
#'     
#' records <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1, 
#'     nruns = 10,
#'     totalTime = 50, 
#'     plot = TRUE)
#'     
#' records <- lapply(records,
#'     fossilRecord2fossilTaxa)
#'     
#' multiDiv(records,
#'     plotMultCurves = TRUE)
#' 
#' # like above...
#'     # but conditioned instead on having 10 extant taxa
#'     # between 1 and 100 time-units
#'     
#' set.seed(4)
#'     
#' records <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1, 
#'     nruns = 10,
#'     totalTime = c(1,300), 
#'     nExtant = 10, 
#'     plot = TRUE
#'     )
#'     
#' records <- lapply(records, 
#'     fossilRecord2fossilTaxa)
#'     
#' multiDiv(records,
#'     plotMultCurves = TRUE
#'     )
#' 
#' ################################################
#' 
#' # How probable were the runs I accepted?
#'     # The effect of conditions
#' 
#' set.seed(1)
#' 
#' # Let's look at an example of a birth-death process
#'     # with high extinction relative to branching
#' 
#' # notes:     
#'     # a) use default run conditions (barely any conditioning)
#'     # b) use print.runs to look at acceptance probability
#'     
#' records <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.8, 
#'     nruns = 10,
#'     print.runs = TRUE, 
#'     plot = TRUE
#'     )
#'     
#' # 10 runs accepted from a total of 10 !
#' 
#' # now let's give much more stringent run conditions
#'     # require 3 extant taxa at minimum, 5 taxa total minimum
#'     
#' records <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.8, 
#'     nruns = 10,
#'     nExtant = c(3,100), 
#'     nTotalTaxa = c(5,100),
#'     print.runs = TRUE, 
#'     plot = TRUE
#'     )
#'     
#' # thousands of simulations to just obtail 10 accepable runs!
#'     # most ended in extinction before minimums were hit
#' 
#' # beware analysis of simulated where acceptance conditions 
#'     # are too stringent: your data will be a 'special case'
#'     # of the simulation parameters
#' # it will also take you a long time to generate reasonable
#'     # numbers of replicates for whatever analysis you are doing
#' 
#' # TLDR: You should look at print.runs = TRUE
#' 
#' ##################################################################
#' 
#' # Using the rate equation-input for complex diversification models
#' 
#' # First up... Diversity Dependent Models!
#' # Let's try Diversity-Dependent Branching over 50 time-units
#' 
#' # first, let's write the rate equation
#' # We'll use the diversity dependent rate equation model
#'     # from Ettienne et al. 2012 as an example here
#'     # Under this equation, p = q at carrying capacity K
#' # Many others are possible!
#' # Note that we don't need to use max(0,rate) as negative rates
#'     # are converted to zero by default, as controlled by
#'     # the argument negRatesAsZero
#' 
#' # From Ettiene et al.
#'     #   lambda = lambda0 - (lambda0 - mu)*(n/K)
#' # lambda and mu are branching rate and extinction rate
#'     # lambda and mu  ==  p and q in paleotree (i.e. Foote convention)
#' # lambda0 is the branching rate at richness = 0
#' # K is the carrying capacity
#' # n is the richness
#' 
#' # 'N' is the algebra symbol for standing taxonomic richness 
#'     # for simFossilRecord's simulation capabilities
#' # also branching rate cannot reference extinction rate
#' # we'll have to set lambda0, mu and K in the rate equation directly
#' 
#' lambda0 <- 0.3  # branching rate at 0 richness in Ltu
#' K <- 40         # carrying capacity 
#' mu <- 0.1       # extinction rate will 0.1 Ltu ( =  1/3 of lambda0 )
#' 
#' # technically, mu here represents the lambda at richness = K
#'     # i.e. lambdaK
#' # Ettienne et al. are just implicitly saying that the carrying capacity
#'     # is the richness at which lambda == mu
#' 
#' # construct the equation programmatically using paste0
#' branchingRateEq <- paste0(lambda0, "-(", lambda0, "-", mu, ")*(N/", K, ")")
#' # and take a look at it...
#' branchingRateEq
#' # its a thing of beauty, folks!
#' 
#' # now let's try it
#' records <- simFossilRecord(
#'     p = branchingRateEq, 
#'     q = mu, 
#'     nruns = 3,
#'     totalTime = 100, 
#'     plot = TRUE, 
#'     print.runs = TRUE
#'     )
#'     
#' records <- lapply(records,
#'     fossilRecord2fossilTaxa)
#'     
#' multiDiv(records,
#'     plotMultCurves = TRUE)
#'     
#' # those are some happy little diversity plateaus!
#' 
#' 
#' # now let's do diversity-dependent extinction
#' 
#' # let's slightly modify the model from Ettiene et al.
#'     #   mu = mu0 + (mu0 - muK)*(n/K)
#' 
#' mu0 <- 0.001     # mu at n = 0
#' muK <- 0.1       # mu at n = K (should be equal to lambda at K)
#' K <- 40          # carrying capacity (like above)
#' lambda <- muK    # equal to muK
#' 
#' # construct the equation programmatically using paste0
#' extRateEq <- paste0(mu0, "-(", mu0, "-", muK, ")*(N/" ,K, ")")
#' extRateEq
#' 
#' # now let's try it
#' records <- simFossilRecord(
#'     p = lambda, 
#'     q = extRateEq, 
#'     nruns = 3,
#'     totalTime = 100, 
#'     plot = TRUE, 
#'     print.runs = TRUE)
#'     
#' records <- lapply(records,
#'     fossilRecord2fossilTaxa)
#'     
#' multiDiv(records,
#'     plotMultCurves = TRUE)
#' 
#' # these plateaus looks a little more spiky 
#'     #( maybe there is more turnover at K? )
#' # also, it took a longer for the rapid rise to occur
#' 
#' ##########################################################
#' 
#' # Now let's try an example with time-dependent origination
#'     # and extinction constrained to equal origination
#' 
#' # Note! Use of time-dependent parameters "D" and "T" may
#' # result in slower than normal simulation run times
#' # as the time-scale has to be discretized; see
#' # info for argument maxTimeStep above
#' 
#' # First, let's define a time-dependent rate equation
#'     # "T" is the symbol for time passed
#' timeEquation <- "0.4-(0.007*T)"
#' 
#' #in this equation, 0.4 is the rate at time = 0
#'     # and it will decrease by 0.007 with every time-unit
#'     # at time = 50, the final rate will be 0.05
#' # We can easily make it so extinction
#'     # is always equal to branching rate
#' # "P" is the algebraic equivalent for
#'      # "branching rate" in simFossilRecord
#' 
#' # now let's try it
#' records <- simFossilRecord(
#'     p = timeEquation, 
#'     q = "P", 
#'     nruns = 3,
#'     totalTime = 50, 
#'     plot = TRUE, 
#'     print.runs = TRUE
#'     )
#'     
#' records <- lapply(records,
#'     fossilRecord2fossilTaxa)
#'     
#' multiDiv(records,
#'     plotMultCurves = TRUE)
#'     
#' # high variability that seems to then smooth out as turnover decreases
#' 
#' # And duration what about duration-dependent processes?
#'     # let's do a duration-dep extinction equation:
#' durDepExt <- "0.01+(0.01*D)"
#' 
#' # okay, let's take it for a spin
#' records <- simFossilRecord(
#'     p = 0.1, 
#'     q = durDepExt, 
#'     nruns = 3,
#'     totalTime = 50,
#'     plot = TRUE, 
#'     print.runs = TRUE
#'     )
#'     
#' records <- lapply(records,
#'     fossilRecord2fossilTaxa)
#'     
#' multiDiv(records,
#'     plotMultCurves = TRUE)
#'     
#' # creates runs full of short lived taxa
#' 
#' # Some more stuff to do with rate formulae!
#' 
#' # The formulae input method for rates allows
#' 	# for the rate to be a random variable
#' 
#' # For example, we could constantly redraw
#'  		# the branching rate from an exponential
#' 
#' record <- simFossilRecord(
#'   p = "rexp(n = 1,rate = 10)",
#'   q = 0.1, r = 0.1, nruns = 1,
#' 	 nTotalTaxa = 50, plot = TRUE)
#' 
#' # Setting up specific time-variable rates can be laborious though
#'     # e.g. one rate during this 10 unit interval, 
#'     # another during this interval, etc
#' 	# The problem is setting this up within a fixed function
#' 
#' #############################################################
#' # Worked Example
#' # What if we want to draw a new rate from a
#'     # lognormal distribution every 10 time units?
#' 
#' # Need to randomly draw these rates *before* running simFossilTaxa
#' # This means also that we will need to individually do each simFossilTaxa run
#'     # since the rates are drawn outside of simFossilTaxa
#' 
#' # Get some reasonable log normal rates:
#' rates <- 0.1+rlnorm(100,meanlog = 1,sdlog = 1)/100
#' 
#' # Now paste it into a formulae that describes a function that
#'     # will change the rate output every 10 time units
#' rateEquation <- paste0(
#'                        "c(",
#'                        paste0(rates,collapse = ","),
#'                        ")[1+(T%/%10)]"
#'                        )
#' 
#' # and let's run it
#' record <- simFossilRecord(
#'     p = rateEquation, 
#'     q = 0.1, 
#'     r = 0.1, 
#'     nruns = 1,
#'     totalTime = c(30,40), 
#'     plot = TRUE
#'     )
#'  
#' #####################################################################
#' 
#' # Speciation Modes
#' 
#' # Some examples of varying the 'speciation modes' in simFossilRecord
#' 
#' # The default is pure budding cladogenesis
#'     # anag.rate = prop.bifurc = prop.cryptic = 0
#' # let's just set those for the moment anyway
#' record <- simFossilRecord(p = 0.1, q = 0.1, r = 0.1,
#'     anag.rate = 0, prop.bifurc = 0, prop.cryptic = 0,
#'     nruns = 1, nTotalTaxa = c(20,30) ,nExtant = 0, plot = TRUE)
#' 
#' #convert and plot phylogeny
#'     # note this will not reflect the 'budding' pattern
#'     # branching events will just appear like bifurcation
#'     # its a typical convention for phylogeny plotting
#' converted <- fossilRecord2fossilTaxa(record)
#' tree <- taxa2phylo(converted,plot = TRUE)
#' 
#' #now, an example of pure bifurcation
#' record <- simFossilRecord(p = 0.1, q = 0.1, r = 0.1,
#'     anag.rate = 0, prop.bifurc = 1, prop.cryptic = 0,
#'     nruns = 1, nTotalTaxa = c(20,30) ,nExtant = 0)
#' tree <- taxa2phylo(fossilRecord2fossilTaxa(record),plot = TRUE)
#' 
#' # all the short branches are due to ancestors that terminate
#'     # via pseudoextinction at bifurcation events
#' 
#' # an example with anagenesis = branching
#' record <- simFossilRecord(
#'     p = 0.1, q = 0.1, r = 0.1,
#'     anag.rate = 0.1, 
#'     prop.bifurc = 0, 
#'     prop.cryptic = 0,
#'     nruns = 1, 
#'     nTotalTaxa = c(20,30),
#'     nExtant = 0
#'     )
#' tree <- taxa2phylo(fossilRecord2fossilTaxa(record),
#'     plot = TRUE)
#' # lots of pseudoextinction
#' 
#' # an example with anagenesis, pure bifurcation
#' record <- simFossilRecord(
#'     p = 0.1, q = 0.1, r = 0.1,
#'     anag.rate = 0.1, 
#'     prop.bifurc = 1, 
#'     prop.cryptic = 0,
#'     nruns = 1, 
#'     nTotalTaxa = c(20,30) ,
#'     nExtant = 0
#'     )
#' tree <- taxa2phylo(
#'     fossilRecord2fossilTaxa(record),
#'     plot = TRUE
#'     )
#' # lots and lots of pseudoextinction
#' 
#' # an example with half cryptic speciation
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1,
#'     anag.rate = 0, 
#'     prop.bifurc = 0, 
#'     prop.cryptic = 0.5,
#'     nruns = 1, 
#'     nTotalTaxa = c(20,30), 
#'     nExtant = 0
#'     )
#' 
#' tree <- taxa2phylo(
#'     fossilRecord2fossilTaxa(record),
#'     plot = TRUE)
#' 
#' # notice that the tree has many more than the maximum of 30 tips:
#'     # that's because the cryptic taxa are not counted as
#'     # separate taxa by default, as controlled by count.cryptic
#' 
#' # an example with anagenesis, bifurcation, cryptic speciation
#' record <- simFossilRecord(
#'     p = 0.1, q = 0.1, r = 0.1,
#'     anag.rate = 0.1, 
#'     prop.bifurc = 0.5, 
#'     prop.cryptic = 0.5,
#'     nruns = 1, 
#'     nTotalTaxa = c(20,30), 
#'     nExtant = 0
#'     )
#' 
#' tree <- taxa2phylo(
#'     fossilRecord2fossilTaxa(record),
#'     plot = TRUE)
#' 
#' # note in this case, 50% of branching is cryptic
#'     # 25% is bifurcation, 25% is budding
#' 
#' # an example with anagenesis, pure cryptic speciation
#'     # morphotaxon identity will thus be entirely indep of branching!
#'     # I wonder if this is what is really going on, sometimes...
#' record <- simFossilRecord(
#'     p = 0.1, q = 0.1, r = 0.1,
#'     anag.rate = 0.1, 
#'     prop.bifurc = 0, 
#'     prop.cryptic = 1,
#'     nruns = 1, 
#'     nTotalTaxa = c(20,30), 
#'     nExtant = 0
#'     )
#' tree <- taxa2phylo(fossilRecord2fossilTaxa(record),
#'     plot = TRUE)
#' 
#' # merging cryptic taxa when all speciation is cryptic
#' set.seed(1)
#' record <- simFossilRecord(
#'     p = 0.1,
#'     q = 0.1, 
#'     r = 0.1,
#'     prop.crypt = 1,
#'     totalTime = 50, 
#'     plot = TRUE
#'     )
#' 
#' # there looks like there is only a single taxon, but...
#' length(record)	
#' 
#' #the above is the *actual* number of cryptic lineages
#' 
#' #########################################################################
#' 
#' # playing with count.cryptic with simulations of pure cryptic speciation
#'     # what if we had fossil records with NO morphological differentiation?
#' 
#' # We can choose to condition on total morphologically-distinguishable taxa
#'     # or total taxa including cryptic taxa with count.cryptic = FALSE
#' 
#' # an example with pure cryptic speciation with count.cryptic = TRUE
#' record <- simFossilRecord(
#'     p = 0.1, q = 0.1, r = 0.1,
#'     anag.rate = 0, 
#'     prop.bifurc = 0, 
#'     prop.cryptic = 1,
#'     nruns = 1, 
#'     totalTime = 50, 
#'     nTotalTaxa = c(10,100), 
#'     count.cryptic = TRUE
#'     )
#'     
#' tree <- taxa2phylo(fossilRecord2fossilTaxa(record))
#' 
#' # plot the tree
#' plot(tree)
#' axisPhylo()
#' 
#' # notice how the tip labels indicate all are the same morphotaxon?
#' 
#' #################
#' # an example with pure cryptic speciation with count.cryptic = FALSE
#'     # Need to be careful with this!
#' 
#' # We'll have to replace the # of taxa constraints with a time constraint
#'     # or else the count.cryptic = FALSE simulation will never end!
#' 
#' record <- simFossilRecord(
#'     p = 0.1, q = 0.1, r = 0.1,
#'     anag.rate = 0, 
#'     prop.bifurc = 0, 
#'     prop.cryptic = 1,
#'     nruns = 1, 
#'     totalTime = 50, 
#'     count.cryptic = FALSE
#'     )
#' tree <- taxa2phylo(fossilRecord2fossilTaxa(record))
#' 
#' # plot it
#' plot(tree)
#' axisPhylo()
#' 
#' ###########################################
#' # let's look at numbers of taxa returned when varying count.cryptic
#'     # with prop.cryptic = 0.5
#' 
#' # Count Cryptic Example Number One
#' # simple simulation going for 50 total taxa	
#' 
#' # first, count.cryptic = FALSE (default)
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1,
#'     anag.rate = 0, 
#'     prop.bifurc = 0, 
#'     prop.cryptic = 0.5,
#'     nruns = 1, 
#'     nTotalTaxa = 50, 
#'     count.cryptic = FALSE
#'     )
#'     
#' taxa <- fossilRecord2fossilTaxa(record)
#' 
#' #### Count the taxa/lineages !
#' # number of lineages (inc. cryptic)
#' nrow(taxa)               
#'
#' # number of morph-distinguishable taxa
#' length(unique(taxa[,6]))     
#' 
#' ###################
#' 
#' # Count Cryptic Example Number Two
#' # Now let's try with count.cryptic = TRUE
#' 
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1,
#'     anag.rate = 0, 
#'     prop.bifurc = 0, 
#'     prop.cryptic = 0.5,
#'     nruns = 1, 
#'     nTotalTaxa = 50, 
#'     count.cryptic = TRUE
#'     )
#'     
#' taxa <- fossilRecord2fossilTaxa(record)
#' 
#' ### Count the taxa/lineages !
#' # number of lineages (inc. cryptic)
#' nrow(taxa)               
#' 
#' # number of morph-distinguishable taxa
#' length(unique(taxa[,6]))     
#' # okay...
#' 
#' ###########
#' 
#' # Count Cryptic Example Number Three 
#' # now let's try cryptic speciation *with* 50 extant taxa!
#' 
#' # first, count.cryptic = FALSE (default)
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1,
#'     anag.rate = 0, 
#'     prop.bifurc = 0, 
#'     prop.cryptic = 0.5,
#'     nruns = 1, 
#'     nExtant = 10, 
#'     totalTime = c(1,100), 
#'     count.cryptic = FALSE
#'     )
#'     
#' taxa <- fossilRecord2fossilTaxa(record)
#' 
#' ### Count the taxa/lineages !
#' # number of still-living lineages (inc. cryptic)
#' sum(taxa[,5])          
#' 
#' # number of still-living morph-dist. taxa
#' length(unique(taxa[taxa[,5] == 1,6]))	
#' 
#' ##############
#' 
#' # Count Cryptic Example Number Four
#' # like above with count.cryptic = TRUE
#' 
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1,
#'     anag.rate = 0, 
#'     prop.bifurc = 0, 
#'     prop.cryptic = 0.5,
#'     nruns = 1, 
#'     nExtant = 10, 
#'     totalTime = c(1,100), 
#'     count.cryptic = TRUE
#'     )
#' taxa <- fossilRecord2fossilTaxa(record)
#' 
#' ### Count the taxa/lineages !
#' # number of still-living lineages (inc. cryptic)
#' sum(taxa[,5])          
#' # number of still-living morph-dist. taxa
#' length(unique(taxa[taxa[,5] == 1,6]))	
#' 
#' #################################################
#' 
#' # Specifying Number of Initial Taxa
#'     # Example using startTaxa to have more initial taxa
#' 
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1, 
#'     nruns = 1,
#'     nTotalTaxa = 100, 
#'     startTaxa = 20, 
#'     plot = TRUE
#'     )
#' 
#' ######################################################
#' 
#' # Specifying Combinations of Simulation Conditions
#' 
#' # Users can generate datasets that meet multiple conditions:
#'     # such as time, number of total taxa, extant taxa, sampled taxa
#' # These can be set as point conditions or ranges
#' 
#' # let's set time = 10-100 units, total taxa = 30-40, extant = 10
#'     #and look at acceptance rates with print.run
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1, 
#'     nruns = 1, 
#'     totalTime = c(10,100), 
#'     nTotalTaxa = c(30,40), 
#'     nExtant = 10,
#'     print.runs = TRUE, 
#'     plot = TRUE
#'     )
#' 
#' # let's make the constraints on totaltaxa a little tighter
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1, 
#'     nruns = 1, 
#'     totalTime = c(50,100), 
#'     nTotalTaxa = 30, 
#'     nExtant = 10,
#'     print.runs = TRUE, 
#'     plot = TRUE
#'     )
#'     
#' # still okay acceptance rates
#' 
#' # alright, now let's add a constraint on sampled taxa
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1, 
#'     nruns = 1, 
#'     totalTime = c(50,100), 
#'     nTotalTaxa = 30, 
#'     nExtant = 10,
#'     nSamp = 15, 
#'     print.runs = TRUE, 
#'     plot = TRUE
#'     )
#' 
#' # still okay acceptance rates
#' 
#' # we can be really odd and instead condition on having a single taxon
#' set.seed(1)
#' 
#' record <- simFossilRecord(
#'     p = 0.1,
#'     q = 0.1, 
#'     r = 0.1, 
#'     nTotalTaxa = 1,
#'     totalTime = c(10,20), 
#'     plot = TRUE
#'     )
#' 
#' ########################################################
#' 
#' # Simulations of Entirely Extinct Taxa
#' 
#' # Typically, a user may want to condition on a precise
#'     # number of sampled taxa in an all-extinct simulation
#' 
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1, 
#'     nruns = 1, 
#'     nTotalTaxa = c(1,100), 
#'     nExtant = 0, 
#'     nSamp = 20,
#'     print.runs = TRUE, 
#'     plot = TRUE
#'     )
#' 
#' # Note that when simulations don't include
#'     # sampling or extant taxa, the plot 
#'     # functionality changes
#" 
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0, 
#'     nruns = 1, 
#'     nExtant = 0, 
#'     print.runs = TRUE, 
#'     plot = TRUE
#'     )
#' 
#' # Something similar happens when there is no sampling
#'     # and there are extant taxa but they aren't sampled
#' 
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0, 
#'     nruns = 1, 
#'     nExtant = 10, 
#'     nTotalTaxa = 100, 
#'     modern.samp.prob = 0,
#'     print.runs = TRUE, 
#'     plot = TRUE
#'     )
#' 
#' ########################################################
#' # Retaining Rejected Simulations
#' 
#' # sometimes we might want to look at all the simulations
#'     # that don't meet acceptability criteria
#' 
#' # In particular, look at simulated clades that go extinct
#'     # rather than surviving long enough to satisfy 
#'     # conditioning on temporal duration.
#' 
#' # Let's look for 10 simulations with following conditioning:
#'     # that are exactly 10 time-units in duration
#'     # that have between 10 and 30 total taxa
#'     # and have 1 to 30 extant taxa after 10 time-units
#' 
#' set.seed(4)
#' 
#' record <- simFossilRecord(
#'     p = 0.1, 
#'     q = 0.1, 
#'     r = 0.1, 
#'     nruns = 10, 
#'     totalTime = 10, 
#'     nTotalTaxa = c(10,30), 
#'     nExtant = c(1,30),
#'     returnAllRuns = TRUE,
#'     print.runs = TRUE, 
#'     plot = TRUE
#'     )
#' 
#' # when returnAllRuns = TRUE, the length of record is 2
#'     # named 'accepted' and 'rejected'
#' 
#' # all the accepted runs (all 10) are in 'accepted'
#' length(record$accepted) 
#' 
#' # all the rejected runs are in 'rejected'
#' length(record$rejected) 
#' 
#' # probably many more than 10! 
#'     # (I got 1770!)
#' 
#' # how many taxa are in each rejected simulation run?
#' totalTaxa_rej <- sapply(record$rejected, length)
#' 
#' # plot as a histogram
#' hist(totalTaxa_rej)
#' # a very nice exponential distribution...
#' 
#' # plot the rejected simulation with the most taxa
#' 
#' divCurveFossilRecordSim(
#'      fossilRecord = record$rejected[[
#'            which(max(totalTaxa_rej) == totalTaxa_rej)[1]
#'            ]]
#'      )
#' 
#' # we can plot all of these too...
#' result <- sapply(record$rejected, 
#'      divCurveFossilRecordSim)
#' 
#' # let's look at the temporal duration of rejected clades
#' 
#' # need to write a function
#' getDuration <- function(record){
#'      taxa <- fossilRecord2fossilTaxa(record)
#'      maxAge <- max(taxa[,"orig.time"], na.rm = TRUE)
#'      minAge <- min(taxa[,"ext.time"], na.rm = TRUE)
#'      cladeDuration <- maxAge - minAge
#'      return(cladeDuration)
#'      }
#' 
#' # all the accepted simulations should have
#'        # identical durations (10 time-units)
#' sapply(record$accepted, getDuration)
#' 
#' # now the rejected set
#' durations_rej <- sapply(record$rejected, getDuration)
#' # plot as a histogram
#' hist(durations_rej)
#' 
#' # Most simulations hit the max time without
#'       # satisfying the other specified constraints
#'       # (probably they didn't have the min of 10 taxa total)
#' 
#' } 



#' @name simFossilRecord
#' @rdname simFossilRecord
#' @export
simFossilRecord <- function(

		# model parameters
		#
		p, 
		q, 
		r = 0, 
		
		anag.rate = 0, 
		prop.bifurc = 0, 
		prop.cryptic = 0,
		
		modern.samp.prob = 1, 
		startTaxa = 1, 
		
		nruns = 1, 
		maxAttempts = Inf,

		# run conditions 
		 # can be given as vectors of length 1 or length 2 ( =  min,max)
		#
		totalTime = c(0, 1000), 
		nTotalTaxa = c(1, 1000),
		nExtant = c(0, 1000), 
		nSamp = c(0, 1000),
		#
		returnAllRuns = FALSE,
		
		#control parameters
		#
		tolerance = 10^-6, 
		maxStepTime = 0.01, 
		shiftRoot4TimeSlice = "withExtantOnly",
		count.cryptic = FALSE, 
		negRatesAsZero = TRUE, 
		print.runs = FALSE, 
		sortNames = FALSE, 
		plot = FALSE
		){

	#####################################################################################
	
	# NOT USED (but in simFossilTaxa):	min.cond = TRUE

	#################################################################################
	
	#example parameter sets
	
	# DEFAULTS (without rates set)
		# r = 0.1;anag.rate = 0;prop.bifurc = 0;prop.cryptic = 0;startTaxa = 1;nruns = 1;
		# nTotalTaxa = c(1,1000);totalTime = c(1,1000);nSamp = c(0,1000);nExtant = c(0,1000);
		# tolerance = 10^-4; maxStepTime = 0.01; shiftRoot4TimeSlice = "withExtantOnly";
		# count.cryptic = FALSE; negRatesAsZero = TRUE; print.runs = FALSE; sortNames = FALSE; plot = FALSE
	
	# BASIC RUN	with diversity-dep extinction
	# p = 0.1;q = '0.01*N'
		# r = 0.1;anag.rate = 0;prop.bifurc = 0;prop.cryptic = 0;startTaxa = 1;nruns = 1;
		# nTotalTaxa = c(10,200);totalTime = c(1,1000);nSamp = c(0,1000);nExtant = c(0,0);plot = TRUE;
		# tolerance = 10^-4; maxStepTime = 0.01; shiftRoot4TimeSlice = "withExtantOnly";
		# count.cryptic = FALSE; negRatesAsZero = TRUE; print.runs = FALSE; sortNames = FALSE; plot = FALSE
	
	##################################################################################
	#
	#simplified birth-death-sampling simulator for fossil record
	#
	#p is the rate of branching
		#may be either budding (prob = 1-prop.bifurc) or bifurcation (prob = prop.bifurc)
	#anagenesis is a separate rate (anag.rate)
	#q is rate of extinction
	#r is rate of sampling
	#
	#ARGUMENT CHECKING
	#
	# number of starting taxa and runs must be at least 1
	if(nruns<1 | maxAttempts<1){
		stop(
			"nruns and maxAttempts must be at least 1"
			)
		}
	if(startTaxa<1){
		stop(
			"startTaxa must be at least 1"
			)
		}
	#nruns, starting taxa must be integer values
	if(!all(sapply(c(nruns,startTaxa),function(x) x == round(x)))){
			stop(
				"nruns and startTaxa must coercible to whole number integers"
				)
			}
	# check that prop.bifurc, prop.cryptic, modern.samp.prob are greater than 0 and less than 1
	if(any(c(prop.bifurc, prop.cryptic, modern.samp.prob)<0) |
			any(c(prop.bifurc, prop.cryptic, modern.samp.prob)>1)){
		stop(paste0(
			"bad parameters input:\n",
			"  prop.bifurc, prop.cryptic and modern.samp.prob must be between 0 and 1"
			))
		}	
	# is prop.bifurc and prop.cryptic consistent?
	if(prop.bifurc>0 & prop.cryptic == 1){
		stop(
			"Prop.bifurc greater than 0 when probability of branching being cryptic is 1"
			)
		}
 	#check that min nSamp isn't higher that 0, if r = 0 or Inf
	if((r == 0 | is.infinite(r)) & nSamp[1]>0){
		stop(
			"Minimum number of required sampled taxa is >0 but sampling rate is zero (or infinite)"
			)}
	# check count.cryptic
	if(!count.cryptic){ #if false
		#check that min nTotalTaxa, nExtant, nSamp isn't higher that 1, if prop.cryptic = 1
		unreachableConstraints <- (
			(prop.cryptic == 1 & anag.rate == 0) & 
			( nTotalTaxa[1]>1 | nExtant[1]>1 | nSamp[1]>1 )
			)
		if(unreachableConstraints){
			stop(paste0(
				"Minimum number of required  nTotalTaxa, nExtant and/or nSamp is >1\n",
				"   but these constraints cannot be reached as \n",
				"   count.cryptic = FALSE, prop.cryptic = 1 and anag.rate = 0)"
				))
			}
		}
	#check that count.cryptic,negRatesAsZero,print.runs,sortNames,plot are all logicals
	if(!all(sapply(c(count.cryptic,negRatesAsZero,print.runs,sortNames,plot),is.logical))){
		stop(
			"count.cryptic, negRatesAsZero, print.runs, sortNames, and plot arguments must be logicals"
			)
		}
	#
	##################################
	# CHECK RUN CONDITIONS
	#
	# nTotalTaxa, nExtant, nSamp must all be integer values
	if(!all(sapply(c(nTotalTaxa,nExtant,nSamp),function(x) x == round(x)))){
		stop(
			"nTotalTaxa, nExtant, nSamp must coercible to whole number integers"
			)
		}		
	#
	runConditions <- list(
		totalTime = totalTime,
		nTotalTaxa = nTotalTaxa,
		nExtant = nExtant,
		nSamp = nSamp
		)
	#check that all are numeric
	if(any(!sapply(runConditions,is.numeric))){
		stop("Run condition arguments must be all of type numeric")
		}
	#are length of 1 or 2
	if(any(sapply(runConditions,length)>2) | any(sapply(runConditions,length)<1)){
		stop("Run condition arguments must be of length 1 or 2")
		}
	# run conditions can be given as vectors of length 1 or 2
		# i.e. a point condition or range
	# turn run conditions of length 1 into vectors of length 2
	runConditions <- lapply(runConditions,function(x)
		if(length(x) == 1){
			c(x,x)
		}else{
			x
			}
		)
	#all values are over or equal to zero
	if(any(!sapply(runConditions,function(x) all(x >= 0)))){
		stop(
			"Run Condition values must be equal to or greater than 0"
			)
		}	
	#with minimums less than maximums
	if(any(!sapply(runConditions,function(x) x[1] <= x[2]))){
		stop(paste0(
			"Run condition misordered:\n", 
			"  Values given as a range must have the minimum before the maximum"
			))
		}
	#
	###########################
	#get the basic rate functions
	getBranchRate <- makeParFunct(p, isBranchRate = TRUE)
	getExtRate <- makeParFunct(q, isBranchRate = FALSE)
	getSampRate <- makeParFunct(r, isBranchRate = FALSE)
	getAnagRate <- makeParFunct(anag.rate, isBranchRate = FALSE)
	#
	# check if time-dependent simulation
	isTimeDep <- any(
		sapply(
			list(getBranchRate,getExtRate,
				getSampRate,getAnagRate
				)
			,attr, 
			which = "timeDep")
		)
	#
	##############################################
	#now iterate for nruns
	acceptedSimulations <- list()
	#
	if(returnAllRuns){
		rejectedSimulations	<- list()
		}
	#
	ntries <- 0
	#
	#for(nAccepted in 1:nruns){
	#
	nAccepted <- 0
	while(nruns > nAccepted){
		#
		accept <- FALSE
		isRejectedRun <- FALSE
		#
		while(!accept){
			ntries <- ntries+1
			#test that haven't exceeded maximum number of attempts
			if(ntries>maxAttempts){
				stop(paste0(
					"Maximum number of attempts (",
					maxAttempts,
					") has been exceeded with only ",
					nAccepted-1,
					" run(s) successful."))
				}
			#
			#initiate the taxa dataset
			timePassed <- 0
			#currentTime is the max time from runConditions
			currentTime <- runConditions$totalTime[2]
			taxa <- initiateTaxa(startTaxa = startTaxa,
				time = currentTime)
			#
			#get vitals
			startVitals <- getRunVitals(taxa = taxa,
				count.cryptic = count.cryptic)
			#start vitals table		
			vitalsRecord <- cbind(timePassed = timePassed,
				t(as.matrix(startVitals)))
			#test to make sure run conditions aren't impossible
			continue <- testContinue(
				vitals = startVitals,
				timePassed = timePassed,
				runConditions = runConditions
				)
			if(!continue){
				stop(
					"Initial starting point already matches or exceeds given run conditions"
					)
				}
			while(continue){
				#only as long as continue = TRUE
				#
				#timePassed from the initiation of the simulation
				timePassed <- runConditions$totalTime[2] - currentTime
				#
				# get rates, sample new event, have it occur
				#
				# first get durations
				taxaDurations <- getTaxonDurations(taxa, currentTime)
				#
				#get event probability vector
				rateMatrix <- getRateMatrix(
					taxa = taxa, 
					timePassed = timePassed,
					taxaDurations = taxaDurations,
					getBranchRate = getBranchRate, 
					getExtRate = getExtRate,
					getSampRate = getSampRate, 
					getAnagRate = getAnagRate,
					prop.cryptic = prop.cryptic, 
					prop.bifurc = prop.bifurc,
					negRatesAsZero = negRatesAsZero
					)
				#

				#draw waiting time to an event (from Peter Smits)
					# exponential with rate = sum of all rates, across all taxa
				changeTime <- rexp(1, rate = sum(rateMatrix))
				#
				# if time dep rates, test 
					# if changeTime is greater than maxStepTime
				#
				if(isTimeDep & (changeTime > maxStepTime)){
					# No event has occurred, just tick time forward
					#
					# redefine changeTime as maxStepTime
					changeTime <- maxStepTime
					#
					# measure time passed
					newTime <-  currentTime - changeTime
					newTimePassed <- timePassed + changeTime
					#
					# obviously no event needs to occur...
					#
				}else{
					# an event has occurred!
					#
					# get the probabilty for each event in rateMatrix
					eventProbMatrix <- rateMatrix/sum(rateMatrix)
					#
					# taken from 
					#   stackoverflow.com/questions/30232740/
					#   randomly-sample-entries-of-a-matrix-and-return
					#   -the-row-column-indexes-in-r
					#
					# simplified: arrayInd(sample(length(m),1,prob = m),dim(m)) 
					#
					#get event type and which taxon it occurs to
					sampledCell <- sample(length(rateMatrix), 
						1, prob = eventProbMatrix)
					#will return as a 1 row matrix, of row # and col #
					sampledCell <- arrayInd(sampledCell, dim(rateMatrix))
					#					
					# what is the event type
					event <- colnames(rateMatrix)[sampledCell[1,2]]
					# who did it happen to (what lineage)
					target <- attr(rateMatrix, "whichExtant")[sampledCell[1,1]]
					#
					# measure time passed
					newTime <-  currentTime - changeTime
					newTimePassed <- timePassed + changeTime
					#
					# make the new event so!
					taxa <- eventOccurs(
						taxa = taxa,
						target = target,
						type = event,
						time = newTime
						)
					}
				#
				####################################################
				#
				#evaluate run conditions NOW for stopping
				#
				#(1) continue = TRUE until 
					# max totalTime, max nTotalTaxa, nSamp or total extinction
					###### none of these can REVERSE ########
				#
				# get vitals
				currentVitals <- getRunVitals(
					taxa = taxa,
					count.cryptic = count.cryptic
					)
				# should we continue ??
				continue <- testContinue(
					vitals = currentVitals,
					timePassed = newTimePassed,
					runConditions = runConditions
					)
				#
				# Updated vitals table
					#for (2), keep a table that records changes in 
						# nTotalTaxa, nExtant, nSamp with timePassed
				# then can quickly evaluate (2)
				currentVitals <- c(
					timePassed = newTimePassed,
					t(as.matrix(currentVitals))
					)
				#
				# combine old vitals with new vitals
				vitalsRecord <- rbind(vitalsRecord,currentVitals)
				#
				# set new current time
				currentTime <- newTime	
				#
				###############################################################
				# some archived debugging lines for posterity
				#if(newTimePassed>74.5){browser()}
				#if(newTimePassed>120){if(taxa[[4]][[1]][4]<120){browser()}}
				#
				}
			###########################################
			#
			# accepting or rejecting runs
			#
			# discussion with Smits 05/11/15
				# real run condition is max limits / total ext
					# for typical birth-death simulators
				# minimums are just for acceptability of runs when they hit run conditions
			#
			# NEED TO AVOID HARTMANN ET AL. EFFECT ---- simFossilTaxa did it wrong!!
				# sample simulation from intervals where it 'matched' run conditions
			#
			# (1) continue = TRUE until max totalTime, max nTotalTaxa, 
					# nSamp or total extinction
				# none of these can REVERSE in a FORARD TIME simulation
			#
			# (2) then go back, find all interval for which run conditions were met
				# if no acceptable intervals, reject run
			#
			# (3) randomly sample within intervals for a single date
				# apply timeSliceFossilRecord
			#
			###########################################
			#
			# use vitalRecords to identify intervals of acceptable parameter values
			#		
			#is it even worth checking? (were mins reached)
			worthyVitals <- worthCheckingVitalsRecord(
				vitalsRecord = vitalsRecord,
				runConditions = runConditions
				)
			#
			# if its worth checking, check it!
			if(worthyVitals){
				#
				#test with testVitalsRecord to get seqVitals
				seqVitals <- testVitalsRecord(
					vitalsRecord = vitalsRecord,
					runConditions = runConditions,
					tolerance = tolerance
					)
				#
				# if there aren't any NAs, then there are
					# acceptable dates to consider!
				#
				if(all(!is.na(seqVitals))){
					#hey, if its an acceptable simulation!!!!!!
					accept <- TRUE
					}
				}
			#
			# if returnAllRuns and a failed run
				# ie if accept is FALSE
			if(returnAllRuns & !accept){
				# "accept it" but label as a failed run
				accept <- TRUE
				isRejectedRun <- TRUE
				}
			#
			}
		# 
		# now two tracks
		#
		# if returnAllRuns & isRejectedRun,
			# do not slice simulation
			# save as is in rejected simulation list
		#
		# otherwise
			# randomly sample for an acceptable slice time
			# slice the simulation
			# save in the accepted simulation list
		#
		#
		if(returnAllRuns & isRejectedRun){
			# if returnAllRuns & isRejectedRun,
				# fake slice simulation at maxtime
				# save as is in rejected simulation list
			# 
			# give it a class...
			attr(taxa, "class") <- c("fossilRecordSimulation", class(taxa))			
			# slice at current time (or 0, whichever is greater)
			slicingDate <- max(c(0, currentTime))
			#
			#print(currentTime)
			#
			# slicing date needs to be in timePassed units
				# need to convert to backwards currentTime
			# which means subtracting from maxTime
			#slicingDate <- runConditions$totalTime[2] - slicingDate
			#
			# and slice
			taxa <- timeSliceFossilRecord(
				fossilRecord = taxa, 
				sliceTime = slicingDate,
				shiftRoot4TimeSlice = shiftRoot4TimeSlice, 
				modern.samp.prob = modern.samp.prob
				)			
			#	
		}else{
			# otherwise
				# randomly sample for an acceptable slice time
				# slice the simulation
				# save in the accepted simulation list
			#############
			#sample the sequences for a slicing date
				# everything that 'passes' past this date will be clipped away
			passedDate <- sampleSeqVitals(seqVitals = seqVitals)
			#this slicing date is in timePassed units
				# need to convert to backwards currentTime
			slicingDate <- runConditions$totalTime[2] - passedDate
			#
			# give it a class...
			attr(taxa, "class") <- c("fossilRecordSimulation", class(taxa))	
			#
			# now time slice
				# if stop and there are extant, evaluate if sampled at modern
				# 0< modern.samp.prob <1 need to randomly sample
			taxa <- timeSliceFossilRecord(
				fossilRecord = taxa, 
				sliceTime = slicingDate,
				shiftRoot4TimeSlice = shiftRoot4TimeSlice, 
				modern.samp.prob = modern.samp.prob
				)		
			#
			# browser()
			#
			# FINAL CONDITION CHECK
			# test that the produced taxa object actually passed the runConditions
			finalTest <- testFinal(
				taxa = taxa,
				timePassed = passedDate,
				runConditions = runConditions,
				count.cryptic = count.cryptic
				)
			}
		#
		#
		##############################################################################
		# FINAL CHECKS FOR ALL RUNS (Accepted and Rejected)
		#
		# are there any non-identical taxa in a simulation with pure cryptic speciation?
		if(anag.rate == 0 & prop.cryptic == 1 & startTaxa == 1){
			taxaIDsTest <- sapply(taxa,function(x) x[[1]][6])
			namesIdenticalTest <- !sapply(taxaIDsTest,function(x) 
				all(x == taxaIDsTest)
				)
			if(any(namesIdenticalTest)){
				stop(
					"Non-cryptic taxa created in a simulation with pure cryptic speciation?!"
					)
				}
			}
		#
		# check that all taxa have orig/ext/sampling times that are less than or
			# equal to zero
		checkNegDates <- checkRecordForNoDatePastZero(fossilRecord = taxa)
		#
		################################################################################
		#
		# name each normal taxon as t + ID 
			# cryptic taxa are cryptic id + . taxon number within that complex
		names(taxa) <- getTaxaNames(taxa = taxa)
		#sort if sortNames
		if(sortNames){
			taxa <- taxa[order(names(taxa))]
			}
		#
		if(returnAllRuns & isRejectedRun){
			# save as is in rejected simulation list
			nextRejected <- length(rejectedSimulations) + 1
			rejectedSimulations[[nextRejected]] <- taxa
			#	
		}else{
			# increment the nruns counter
			nAccepted <- (nAccepted + 1)
			#
			# save in the accepted simulation list		
			acceptedSimulations[[nAccepted]] <- taxa			
			#
			if(plot){
				divCurveFossilRecordSim(fossilRecord = taxa)
				if(nruns>1){
					title(paste0(
						"Run Number ",
						nAccepted,
						" of ",
						nruns
						))
					}
				}
			}
		}
	#
	# check to make sure that the right number of accepted simulations is returned
	if(length(acceptedSimulations) != nAccepted){
		stop(
			"The number of accepted simulations returned doesn't match the internal counter."
			)
		}
	#
	#
	if(print.runs){
		runNumMessage <- ifelse(nruns==1,
			paste0("1 run"),
			paste0(nruns, " runs")
			)
		message(paste0(
			runNumMessage,
			" accepted from ",
			ntries,
			" total runs (",
			signif(nruns/ntries,2),
			" Acceptance Probability)"
			))
		}
	#
	# Prepare output
	#
	if(nruns == 1){
		if(length(acceptedSimulations)>1){
			stop("More than one run accepted despite nruns = 1 ?!")
			}
		acceptedSimulations <- acceptedSimulations[[1]]
		}
	#
	if(returnAllRuns){
		results <- list(
			accepted = acceptedSimulations, 
			rejected = rejectedSimulations
			)
	}else{
		results <- acceptedSimulations
		}
	#
	return(results)	
	}	

Try the paleotree package in your browser

Any scripts or data that you put into this service are public.

paleotree documentation built on Aug. 22, 2022, 9:09 a.m.