A function for simulating the effect of incomplete sampling of the fossil record.
1 2 3 4
A two-column matrix of per-taxon ranges. The five-column matrix
Instantaneous average sampling rate per lineage time units; given as a vector of length one or length equal to the number of taxa.
Alpha parameter of beta distribution; given as a vector of length one or length equal to the number of taxa.
Beta parameter of beta distribution; given as a vector of length one or length equal to the number of taxa.
Ratio of most recent sampling rate over earliest sampling rate; given as a vector of length one or length equal to the number of taxa.
Probability of sampling living taxa at the present day (time=0), see below.
Minimum number of taxa sampled. The default is 2.
Minimum interval size used for simulating complex models
This function implements a range of sampling models in continuous time. Be
default, sampling is simulated under the simplest model, where sampling
occurs as a Poisson process under a instantaneous sampling rate (r) which is
homogeneous through time and across lineages (Foote, 1997). Under this model,
the waiting times to sampling events are exponentially distributed, with an
average waiting time of 1/r. This useful property allows sampling to be
rapidly simulated for many taxa under this simple model in
repeatedly drawing waiting times between sampling events from an exponential
distribution. This is the model that is run when
In addition to this simple model, sampleRanges also can consider a range of
additional models, including the hatP and incP options of Liow et al.
(2010). To describe the behavior of these models, users alter the default
These parameters, and
r, can either
be a single value which describes the behavior of the entire dataset or as a
vector, of same length as the number of taxa, which describes the per-taxon
value. When any
rTimeRatio, alpha or beta value is not equal to one, then
the sampling rate will vary across the duration of a taxon's temporal range.
In general, setting alpha and beta equal to a value above 2 produce "hat" or
bell-shaped curves where sampling rates peak at the midpoint of taxon
ranges, while setting them unequal will produce asymmetric bell curves
according to the beta function (Liow et al., 2010; Liow et al. set
rTimeRatio is the
ratio of the sampling rate of the
latest/most recent time divided by the earliest/oldest time.
The input r values will be interpreted differently based on whether one r
value or per-taxon values were used. If one value was input, then it is
assumed that r represent the grand mean r for the entire dataset for purposes
of time-varying r, such that if
rTimeRatio is not equal to 1, taxa near the
end and start of the dataset will have very different per-taxon mean
sampling rate. If per-taxon values of r were input, then each r is consider
the per-taxon mean sampling rate. These will not be changed, but any
within-lineage variation is distributed so that the mean is still the input
per-taxon value. This also changes the interpretation of
that when a single r value and
rTimeRatio is given, it is assumed the ratio
describes the change in sampling rates from the start of the dataset to the
end, while if multiple values are given for either r or
instead see the value as describing the ratio at the first and last times of
each taxon. For the pure hat model, this interpretation of 'r' as a grand mean
sampling means that taxa will have a sampling rate of 2*r at the mid-peak of their
range, which will have considerable implications for taxonomic incompleteness.
The particular distinctions about these parameter values are important: all
models simulated in sampleRanges are structured to be effectively nested
inside a most general model with parameters
Note that the modeling of sampling in this function is independent and
secondary of the actual simulation of the ranges, which are (generally)
produced by the models of simFossilRecord with argument
(sampling rate) not set. Thus, 'hat-shaped range
distributions' are only contained within single morphotaxa; they do not
cross multiple morphotaxa in the case of anagenesis. Cryptic taxa each have
their own hat and do not share a single hat; by default the ranges of
cryptic taxa are merged to produce the range of a single observed
'Hats' are constrained to start and end with a taxon's range, representing
the rise and fall of taxa in terms of abundance and geographic range (Liow
et al., 2010). However, for still-living taxa at the modern day, it is
unknown how much longer they may be alive (for memoryless Poisson models,
there is no age-dependent extinction). The treatment of these taxa with
regards to their 'hat' (i. e. the beta distribution) is controlled by randLivehat:
ranLiveHat=FALSE, the beta distribution is fit so that the last appearance of
still-alive taxa at the modern day is treated as a last appearance for
calculating the hat. When
the default option, the still-alive taxa are considered to have gotten some
distance between 0 and 1 through the beta distribution, as of the modern
day. This point of progression is stochastically selected for each taxon by
pulling a number from a uniform distribution, and used for calculating the hat.
Because sampling rate varies over morphotaxon ranges under any of these more complex models, sampling events cannot be quickly simulated as waiting times pulled from an exponential distribution. Instead, the taxon durations are discretized into a large number of small time intervals of length minInt (see above; minInt should be small enough that only one sampling event could feasibly happen per interval). The probability of an event occurring within each interval is calculated and used to stochastically simulate sampling events. For each interval, a number between 0 and 1 is randomly pulled from a uniform distribution and to the per-interval sampling probability to test if a sampling event occurred (if the random number is less than the probability, a sampling event is recorded). In general, this method is slower but otherwise comparable to the quicker waiting times method. See the examples below for a small test of this.
As with many functions in the paleotree library, absolute time is always decreasing, i.e. the present day is zero.
min.taxa is set to zero, the simulation may produce output in which no
taxa were ever sampled.
modern.samp.prob is set to 1.0 (the default), then living taxa will
always be sampled at least at the present day (if there are any living
taxa). If the probability is less than 1, they will be sampled with that
probability at the modern day.
By default, this function will merge sampling events from morphologically
cryptic taxa, listing them as occurrences for the earliest member of that
group. To change this behavior, set
merge.cryptic = FALSE.
Conditioning on sampling some minimum number of taxa may create strange simulation results for some analyses, such as simulation analyses of birth-death processes. Set min.taxa=0 to remove this conditioning.
If ranges.only is
TRUE, then the output is a two-column per-taxon
matrix of first and last appearances in absolute time. NAs mean the taxon
was never sampled in the simulation.
ranges.only = FALSE (the default), the output is a list, where each
element is a vector of sampling events the timing of sampling events, each
corresponding to a different taxon in the input. Elements that are
David W. Bapst
Foote, M. 1997 Estimating Taxonomic Durations and Preservation Probability. Paleobiology 23(3):278–300.
Liow, L. H., T. B. Quental, and C. R. Marshall. 2010 When Can Decreasing Diversification Rates Be Detected with Molecular Phylogenies and the Fossil Record? Systematic Biology 59(6):646–659.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
set.seed(444) record<-simFossilRecord(p=0.1, q=0.1, nruns=1, nTotalTaxa=c(30,40), nExtant=0) taxa<-fossilRecord2fossilTaxa(record) layout(1:2) #let's see what the 'true' diversity curve looks like in this case taxicDivCont(taxa) #simulate a fossil record with imperfect sampling with sampleRanges rangesCont <- sampleRanges(taxa,r=0.5) #plot the diversity curve based on the sampled ranges taxicDivCont(rangesCont) #compare the true history to what we might observe! #let's try more complicated models! #a pull-to-the-recent model with x5 increase over time similar to Liow et al.'s incP layout(1:2) rangesCont1 <- sampleRanges(taxa,r=0.5,rTimeRatio=5,plot=TRUE) taxicDivCont(rangesCont1) #a hat-shaped model layout(1:2) rangesCont1 <- sampleRanges(taxa,r=0.5,alpha=4,beta=4,plot=TRUE) taxicDivCont(rangesCont1) #a combination of these layout(1:2) rangesCont1 <- sampleRanges(taxa,r=0.5,alpha=4,beta=4,rTimeRatio=5,plot=TRUE) taxicDivCont(rangesCont1) #testing with cryptic speciation layout(1) recordCrypt<-simFossilRecord(p=0.1, q=0.1, prop.cryptic=0.5, nruns=1, nTotalTaxa=c(20,30), nExtant=0) taxaCrypt<-fossilRecord2fossilTaxa(recordCrypt) rangesCrypt <- sampleRanges(taxaCrypt,r=0.5) taxicDivCont(rangesCrypt) #an example of hat-shaped models (beta distributions) when there are live taxa set.seed(444) recordLive<-simFossilRecord(p=0.1, q=0.05, nruns=1, nTotalTaxa=c(5,100),nExtant=c(10,100)) taxaLive<-fossilRecord2fossilTaxa(recordLive) #with end-points of live taxa at random points in the hat rangesLive<-sampleRanges(taxaLive,r=0.1,alpha=4,beta=4,randLiveHat=TRUE,plot=TRUE) #with all taxa end-points at end-point of hat rangesLive<-sampleRanges(taxaLive,r=0.1,alpha=4,beta=4,randLiveHat=FALSE,plot=TRUE) #simulate a model where sampling rate evolves under brownian motion tree<-taxa2phylo(taxa,obs=taxa[,3]) sampRateBM <- rTraitCont(tree) sampRateBM <- sampRateBM-min(sampRateBM) layout(1:2) rangesCont1 <- sampleRanges(taxa,r=sampRateBM,plot=TRUE) taxicDivCont(rangesCont1) #evolving sampling rate, hat model and pull of the recent layout(1:2) rangesCont1 <- sampleRanges(taxa,r=sampRateBM,alpha=4,beta=4,rTimeRatio=5,plot=TRUE) taxicDivCont(rangesCont1) layout(1) #the simpler model is simulated by pulling waiting times from an exponential #more complicated models are simulated by discretizing time into small intervals #are these two methods comparable? #let's look at the number of taxa sampled under both methods summary(replicate(100,sum(!is.na(sampleRanges(taxa,r=0.5,alt.method=FALSE)[,1])))) summary(replicate(100,sum(!is.na(sampleRanges(taxa,r=0.5,alt.method=TRUE)[,1])))) #they look pretty similar!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.