sampleRanges | R Documentation |
A function for simulating the effect of incomplete sampling of the fossil record.
sampleRanges( taxaData, r, alpha = 1, beta = 1, rTimeRatio = 1, modern.samp.prob = 1, min.taxa = 2, ranges.only = TRUE, minInt = 0.01, merge.cryptic = TRUE, randLiveHat = TRUE, alt.method = FALSE, plot = FALSE )
taxaData |
A two-column matrix of per-taxon ranges. The five-column matrix
output of |
r |
Instantaneous average sampling rate per lineage time units; given
as a vector of |
alpha |
Alpha parameter of beta distribution; given as a vector of
|
beta |
Beta parameter of beta distribution; given as a vector of
|
rTimeRatio |
Ratio of most recent sampling rate over earliest sampling
rate; given as a vector of |
modern.samp.prob |
Probability of sampling living taxa at the present day (time = 0), see below. |
min.taxa |
Minimum number of taxa sampled. The default is 2. |
ranges.only |
If |
minInt |
Minimum interval size used for simulating complex models. See details. |
merge.cryptic |
If |
randLiveHat |
If |
alt.method |
If |
plot |
If |
This function implements a range of sampling models in continuous time. Be
default, sampling is simulated under the simplest model, where sampling
occurs as a Poisson process under a instantaneous sampling rate (r
) which is
homogeneous through time and across lineages (Foote, 1997). Under this model,
the waiting times to sampling events are exponentially distributed, with an
average waiting time of 1/r. This useful property allows sampling to be
rapidly simulated for many taxa under this simple model in sampleRanges
, by
repeatedly drawing waiting times between sampling events from an exponential
distribution. This is the model that is run when
alpha
, beta
and rTimeRatio
.
In addition to this simple model, sampleRanges
also can consider a range of
additional models, including the "hatP" and "incP" options of Liow et al.
(2010). To describe the behavior of these models, users alter the default
values for alpha
, beta
and rTimeRatio
.
These parameters, and r
, can either
be a single value which describes the behavior of the entire dataset or as a
vector, of same length as the number of taxa, which describes the per-taxon
value. When any rTimeRatio
, alpha or beta value is not equal to one, then
the sampling rate will vary across the duration of a taxon's temporal range.
In general, setting alpha and beta equal to a value above 2 will produce a "hat" or
bell-shaped curve, where sampling rates peak at the midpoint of taxon
ranges, while setting them unequal will produce asymmetric bell curves
according to the beta function (Liow et al., 2010; Liow et al. set
alpha = 4
and beta = 4
). rTimeRatio
is the
ratio of the sampling rate of the
latest (most recent) time divided by the earliest (oldest) time.
The input r
values will be interpreted differently based on whether one r
value or per-taxon values were used. If one value was input, then it is
assumed that r
represent the grand mean r
for the entire dataset for purposes
of time-varying r
, such that if rTimeRatio
is not equal to 1, taxa near the
end and start of the dataset will have very different per-taxon mean
sampling rate.
If per-taxon values of r
were input, then each r
is consider
the per-taxon mean sampling rate. These will not be changed, but any
within-lineage variation is distributed so that the mean is still the input
per-taxon value. This also changes the interpretation of rTimeRatio
, such
that when a single r
value and rTimeRatio
is given, it is assumed the ratio
describes the change in sampling rates from the start of the dataset to the
end, while if multiple values are given for either r
or rTimeRatio
will
instead see the value as describing the ratio at the first and last times of
each taxon. For the pure hat model, this interpretation of r
as a grand mean
sampling means that taxa will have a sampling rate of 2 * r
at the mid-peak of their
range, which will have considerable implications for taxonomic incompleteness.
The particular distinctions about these parameter values are important: all
models simulated in sampleRanges
are structured to be effectively nested
inside a most general model with parameters r
,
alpha
, beta
and rTimeRatio
.
Note that the modeling of sampling in this function is independent and
secondary of the actual simulation of the ranges, which are (generally)
produced by the models of simFossilRecord
with argument r
(sampling rate) not set. Thus, 'hat-shaped range distributions'
are only contained within single morphotaxa – they do not
cross multiple morphotaxa in the case of anagenesis. Cryptic taxa each have
their own hat and do not share a single hat; by default the ranges of
cryptic taxa are merged to produce the range of a single observed
morphotaxon.
'Hats' are constrained to start and end with a taxon's range, representing
the rise and fall of taxa in terms of abundance and geographic range (Liow
et al., 2010).
However, for still-living taxa at the modern day, it is
unknown how much longer they may be alive (for memory-less Poisson models,
there is no age-dependent extinction).
The treatment of these taxa with regards to their 'hat' (i. e. the beta distribution)
is controlled by the argument randLivehat
.
When randLiveHat = FALSE
, the beta distribution is fit so that the last appearance of
still-alive taxa at the modern day is treated as a last appearance for calculating the hat.
When TRUE
, the default option, the still-alive taxa are considered to have gotten some
distance between 0 and 1 through the beta distribution, as of the modern day.
This point of progression is stochastically selected for each taxon by
pulling a number from a uniform distribution, and used for calculating the hat.
Because sampling rate varies over morphotaxon ranges under any of these more
complex models, sampling events cannot be quickly simulated as waiting times
pulled from an exponential distribution. Instead, the taxon durations are
discretized into a large number of small time intervals of length minInt
(see above; minInt
should be small enough that only one sampling event could
feasibly happen per interval). The probability of an event occurring within
each interval is calculated and used to stochastically simulate sampling
events. For each interval, a number between 0 and 1 is randomly pulled from
a uniform distribution and to the per-interval sampling probability to test
if a sampling event occurred (if the random number is less than the
probability, a sampling event is recorded). In general, this method is
slower but otherwise comparable to the quicker waiting times method. See the
examples below for a small test of this.
As with many functions in the paleotree
library, absolute time is always
decreasing, i.e. the present day is zero.
If min.taxa
is set to zero, the simulation may produce output in which no
taxa were ever sampled.
If modern.samp.prob
is set to 1.0 (the default), then living taxa will
always be sampled at least at the present day (if there are any living
taxa). If the probability is less than 1, they will be sampled with that
probability at the modern day.
By default, this function will merge sampling events from morphologically
cryptic taxa, listing them as occurrences for the earliest member of that
group. To change this behavior, set merge.cryptic = FALSE
.
Conditioning on sampling some minimum number of taxa may create strange
simulation results for some analyses, such as simulation analyses of
birth-death processes. Set min.taxa = 0
to remove this conditioning.
If ranges.only
is TRUE
, then the output is a two-column per-taxon
matrix of first and last appearances in absolute time. NA
s mean the respective taxon
was never sampled in the simulation.
If ranges.only = FALSE
(the default), the output is a list, where each
element is a vector of sampling events the timing of sampling events, each
corresponding to a different taxon in the input. Elements that are NA
are
unsampled taxa.
David W. Bapst
Foote, M. 1997 Estimating Taxonomic Durations and Preservation Probability. Paleobiology 23(3):278–300.
Liow, L. H., T. B. Quental, and C. R. Marshall. 2010 When Can Decreasing Diversification Rates Be Detected with Molecular Phylogenies and the Fossil Record? Systematic Biology 59(6):646–659.
simFossilRecord
, binTimeData
set.seed(444) record <- simFossilRecord( p = 0.1, q = 0.1, nruns = 1, nTotalTaxa = c(30,40), nExtant = 0 ) taxa <- fossilRecord2fossilTaxa(record) # let's see what the 'true' diversity curve looks like in this case layout(1:2) taxicDivCont(taxa) # simulate a fossil record with imperfect sampling with sampleRanges rangesCont <- sampleRanges(taxa,r = 0.5) # plot the diversity curve based on the sampled ranges taxicDivCont(rangesCont) # compare the true history to what we might observe! #let's try more complicated models! # a pull-to-the-recent model with x5 increase over time # similar to Liow et al.'s incP layout(1:2) rangesCont1 <- sampleRanges(taxa, r = 0.5, rTimeRatio = 5, plot = TRUE ) taxicDivCont(rangesCont1) # a hat-shaped model layout(1:2) rangesCont1 <- sampleRanges(taxa, r = 0.5, alpha = 4, beta = 4, plot = TRUE ) taxicDivCont(rangesCont1) # a combination of these layout(1:2) rangesCont1 <- sampleRanges(taxa, r = 0.5, alpha = 4, beta = 4, rTimeRatio = 5, plot = TRUE ) taxicDivCont(rangesCont1) # testing with cryptic speciation layout(1) recordCrypt <- simFossilRecord(p = 0.1, q = 0.1, prop.cryptic = 0.5, nruns = 1, nTotalTaxa = c(20,30), nExtant = 0 ) taxaCrypt <- fossilRecord2fossilTaxa(recordCrypt) rangesCrypt <- sampleRanges(taxaCrypt,r = 0.5) taxicDivCont(rangesCrypt) #an example of hat-shaped models (beta distributions) when there are live taxa set.seed(444) recordLive <- simFossilRecord(p = 0.1, q = 0.05, nruns = 1, nTotalTaxa = c(5,100), nExtant = c(10,100) ) taxaLive <- fossilRecord2fossilTaxa(recordLive) #with end-points of live taxa at random points in the hat rangesLive <- sampleRanges(taxaLive, r = 0.1, alpha = 4, beta = 4, randLiveHat = TRUE, plot = TRUE ) #with all taxa end-points at end-point of hat rangesLive <- sampleRanges(taxaLive, r = 0.1, alpha = 4, beta = 4, randLiveHat = FALSE, plot = TRUE ) #simulate a model where sampling rate evolves under brownian motion tree <- taxa2phylo(taxa,obs = taxa[,3]) sampRateBM <- rTraitCont(tree) sampRateBM <- sampRateBM-min(sampRateBM) layout(1:2) rangesCont1 <- sampleRanges(taxa,r = sampRateBM,plot = TRUE) taxicDivCont(rangesCont1) #evolving sampling rate, hat model and pull of the recent layout(1:2) rangesCont1 <- sampleRanges(taxa, r = sampRateBM, alpha = 4, beta = 4, rTimeRatio = 5, plot = TRUE ) taxicDivCont(rangesCont1) layout(1) #the simpler model is simulated by pulling waiting times from an exponential #more complicated models are simulated by discretizing time into small intervals #are these two methods comparable? #let's look at the number of taxa sampled under both methods summary(replicate(100,sum(!is.na( sampleRanges(taxa, r = 0.5, alt.method = FALSE ) [,1])))) summary(replicate(100,sum(!is.na( sampleRanges(taxa, r = 0.5, alt.method = TRUE ) [,1])))) #they look pretty similar!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.