durationFreq | R Documentation |
These functions construct likelihood models of the observed frequency
of taxon durations, given either in discrete (make_durationFreqDisc
)
or continuous time (make_durationFreqCont
). These models can then be
constrained using functions available in this package and/or analyzed
with commonly used optimizing functions.
make_durationFreqCont( timeData, groups = NULL, drop.extant = TRUE, threshold = 0.01, tol = 1e-04 ) make_durationFreqDisc(timeList, groups = NULL, drop.extant = TRUE)
timeData |
Two-column matrix of per-taxon first and last occurrence given in continuous time, relative to the modern (i.e. older dates are also the 'larger' dates). Unsampled taxa (e.g. from a simulation of sampling in the fossil record, listed as NAs the supplied matrix) are automatically dropped from the matrix and from groups simultaneously. |
groups |
Either NULL (the default) or matrix with the number of rows equal
to the number of taxa and the number of columns equal to the number of 'systems'
of categories for taxa. Taxonomic membership in different groups is indicated
by numeric values.
For example, a dataset could have a 'groups' matrix composed of a column representing
thin and thick shelled taxa, coded |
drop.extant |
Drops all extant taxa from a dataset before preceding. |
threshold |
The smallest allowable duration (i.e. the measured difference in the first and last occurrence dates for a given taxon). Durations below this size will be treated as "one-hit" sampling events. |
tol |
Tolerance level for determining whether a taxon from a continuous-time
analysis is extant or not. Taxa which occur at a date less than |
timeList |
A two column matrix, with the first and last occurrences of taxa
given in relative time intervals (i.e. ordered from first to last). If a list
of |
These functions effectively replace two older functions in paleotree, now removed,
getSampRateCont
and getSampProbDisc
. The
functions here do not offer the floating time interval options of
their older siblings, but do allow for greater flexibility in defining
constrains on parameter values. Differences in time intervals, or any
other conceivable discrete differences in parameters, can be modeled
using the generic groups
argument in these functions.
These functions use likelihood functions presented by Foote (1997). These analyses are ideally applied to data from single stratigraphic section but potentially are applicable to regional or global datasets, although the behavior of those datasets is less well understood.
As with many functions in the paleotree library, absolute time is always decreasing, i.e. the present day is zero and older dates are 'larger'. On the contrary, relative time is in intervals with non-zero integers that increase sequentially beginning with 1, from earliest to oldest.
For make_durationFreqDisc
, the intervals in timeList
should be
non-overlapping sequential intervals of roughly equal length. These
should be in relative time as described above, so the earliest interval
should be listed as 1
and the numbering should increase as the intervals go up with
age. If both previous statements are TRUE
, then differences in interval
numbers will represent the same rough difference in the absolute timing
of those intervals. For example, a dataset where all taxa are listed from
a set of sequential intervals of similar length, such as North American
Mammal assemblage zones, microfossil faunal zones or graptolite biozones
can be given as long as they are correctly numbered in sequential order
in the input. As a counter example, a dataset which includes taxa resolved
only to intervals as wide as the whole Jurassic and taxa resolved to
biozones within the Jurassic should not be included in the same input.
Drop taxa from less poorly resolved intervals from such datasets if you
want to apply this function, as long as this retains a large enough sample
of taxa listed from the sequential set of intervals.
Please check that the optimizer function you select actually converges. The likelihood surface can be very flat in some cases, particularly for small datasets (<100 taxa). If the optimizer does not converge, consider increasing iterations or changing the starting values.
A function of class "paleotreeFunc"
, which takes a vector equal to the number
of parameters and returns the *negative* log-likelihood
(for use with optim
and similar optimizing functions,
which attempt to minimize support values).
See the functions listed at modelMethods
for manipulating and examining
such functions and constrainParPaleo
for constraining parameters.
Parameters in the output functions are named q
for the instantaneous per-capita
extinction rate, r
for the instantaneous per-capita sampling rate and R
for
the per-interval taxonomic sampling probability. Groupings follow the parameter
names, separated by periods; by default, the parameters will be placed in at
least group '1
' (of a grouping scheme containing only a single group), such that make_durationFreqCont
by default creates a function with parameters named q.1
and r.1
, while
make_durationFreqDisc
creates a function with parameters named q.1
and R.1
.
Note that the q
parameters estimated by make_durationFreqDisc
is scaled to
per lineage intervals and not to per lineage time-units. If intervals are the same length, this
can be easily corrected by multiplying one by the interval length. It is unclear
how to treat uneven intervals and I urge users to consider multiple strategies.
For translating these sampling probabilities and sampling rates, see
SamplingConv
.
David W. Bapst
Foote, M. 1997 Estimating Taxonomic Durations and Preservation Probability. Paleobiology 23(3):278–300.
Foote, M., and D. M. Raup. 1996 Fossil preservation and the stratigraphic ranges of taxa. Paleobiology 22(2):121–140.
See freqRat
, sRate2sProb
,
qsRate2Comp
sProb2sRate
and qsProb2Comp
.
# let's simulate some taxon ranges from # an imperfectly sampled fossil record set.seed(444) record <- simFossilRecord( p = 0.1, q = 0.1, nruns = 1, nTotalTaxa = c(30,40), nExtant = 0 ) taxa <- fossilRecord2fossilTaxa(record) rangesCont <- sampleRanges(taxa,r = 0.5) #bin the ranges into discrete time intervals rangesDisc <- binTimeData(rangesCont,int.length = 1) #note that we made interval lengths = 1: # thus q (per int) = q (per time) for make_durationFreqDisc ## Not run: #old ways of doing it (defunct as of paleotree version 2.6) getSampRateCont(rangesCont) getSampProbDisc(rangesDisc) ## End(Not run) #new ways of doing it # we can constrain our functions # we can use parInit, parLower and parUpper # to control parameter bounds #as opposed to getSampRateCont, we can do: likFun <- make_durationFreqCont(rangesCont) optim(parInit(likFun), likFun, lower = parLower(likFun), upper = parUpper(likFun), method = "L-BFGS-B", control = list(maxit = 1000000) ) #as opposed to getSampProbDisc, we can do: likFun <- make_durationFreqDisc(rangesDisc) optim(parInit(likFun), likFun, lower = parLower(likFun), upper = parUpper(likFun), method = "L-BFGS-B", control = list(maxit = 1000000) ) #these give the same answers (as we'd expect them to!) #with newer functions we can constrain our functions easily # what if we knew the extinction rate = 0.1 a priori? likFun <- make_durationFreqCont(rangesCont) likFun <- constrainParPaleo(likFun,q.1~0.1) optim(parInit(likFun), likFun, lower = parLower(likFun), upper = parUpper(likFun), method = "L-BFGS-B", control = list(maxit = 1000000) ) #actually decreases our sampling rate estimate # gets further away from true generating value, r = 0.5 (geesh!) # but this *is* a small dataset...
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.