termTaxa: Simulating Extinct Clades of Monophyletic Taxa

termTaxaR Documentation

Simulating Extinct Clades of Monophyletic Taxa

Description

This function simulates the diversification of clades composed of monophyletic terminal taxa, which are distinguished in a fashion completely alternative to way taxa are defined in the simulation functions simFossilRecord, taxa2cladogram and taxa2phylo.

Usage

simTermTaxa(ntaxa, sumRate = 0.2)

simTermTaxaAdvanced(
  p = 0.1,
  q = 0.1,
  mintaxa = 1,
  maxtaxa = 1000,
  mintime = 1,
  maxtime = 1000,
  minExtant = 0,
  maxExtant = NULL,
  min.cond = TRUE
)

trueTermTaxaTree(TermTaxaRes, time.obs)

deadTree(ntaxa, sumRate = 0.2)

Arguments

ntaxa

Number of monophyletic 'terminal' taxa (tip terminals) to be included on the simulated tree

sumRate

The sum of the instantaneous branching and extinction rates; see below.

p

Instantaneous rate of speciation/branching.

q

Instantaneous rate of extinction.

mintaxa

Minimum number of total taxa over the entire history of a clade necessary for a dataset to be accepted.

maxtaxa

Maximum number of total taxa over the entire history of a clade necessary for a dataset to be accepted.

mintime

Minimum time units to run any given simulation before stopping.

maxtime

Maximum time units to run any given simulation before stopping.

minExtant

Minimum number of living taxa allowed at end of simulations.

maxExtant

Maximum number of living taxa allowed at end of simulations.

min.cond

If TRUE, the default, simulations are stopped when they meet all minimum conditions. If FALSE, simulations will continue until they hit maximum conditions, but are only accepted as long as they still meet all minimum conditions in addition.

TermTaxaRes

The list output produced by simTermTaxa.

time.obs

A per-taxon vector of times of observation for the taxa in TermTaxaRes.

Details

deadTree generates a time-scaled topology for an entirely extinct clade of a specific number of tip taxa. Because the clade is extinct and assumed to have gone extinct in the distant past, many details of typical birth-death simulators can be ignored. If a generated clade is already conditioned upon the (a) that some number of taxa was reached and (b) then the clade went extinct, the topology (i.e. the distribution of branching and extinction events) among the branches should be independent of the actual generating rate. The frequency of nodes is a simple mathematical function of the number of taxa (i.e. number of nodes is the number of taxa -1) and their placement should completely random, given that we generally treat birth-death processes as independent Poisson processes. Thus, in terms of generating the topology, this function is nothing but a simple wrapper for the ape function rtree, which randomly places splits among a set of taxa using a simple algorithm (see Paradis, 2012). To match the expectation of a birth-death process, new branch lengths are calculated as an exponential distribution with mean 1/sumRate, where sumRate represents the sum of the branching and extinction rates. Although as long as both the branching rate and extinction rates are more than zero, any non-ultrametric tree is possible, only when the two rates are non-zero and equal to each other will there be a high chance of getting an extinct clade with many tips. Any analyses one could do on a tree such as this will almost certainly give estimates of equal branching and extinction rates, just because all taxa are extinct.

simTermTaxa produces 'terminal-taxon' datasets; datasets of clades where the set of distinguishable taxa are defined as intrinsically monophyletic. (In version 1.6, I referred to this as the 'candle' mode, so named from the 'candling' horticultural practice and the visual conceptualization of the model.) On theoretical terms, terminal-taxa datasets are what would occur if (a) only descendant lineages can be sample and (b) all taxa are immediately differentiated as of the last speciation event and continue to be so differentiated until they go extinct. In practice, this means the taxa on such a tree would represent a sample of all the terminal branches, which start with some speciation event and end in an extinction event. These are taken to be the true original ranges of these taxa. No further taxa can be sampled than this set, whatsoever. Note that the differentiation here is a result of a posteriori consideration of the phylogeny: one can't even know what lineages could be sampled or the actual start points of such taxa until after the entire phylogeny of a group of organisms is generated.

Because all evolutionary history prior to any branching events is unsampled, this model is somewhat agnostic about the general model of differentiation among lineages. The only thing that can be said is that synapomorphies are assumed to be potentially present along every single branch, such that in an ideal scenario every clade could be defined. This would suggest very high anagenesis or bifurcation.

Because the set of observable taxa is a limited subset of the true evolution history, the true taxon ranges are not a faithful reproduction of the true diversity curve. See an example below.

simTermTaxa uses deadTree to make a phylogeny, so the only datasets produced are of extinct clades. simTermTaxaAdvanced is an alternative to simTermTaxa which uses simFossilRecord to generate the underlying pattern of evolutionary relationships and not deadTree. The arguments are thus similar to simFossilRecord, with some differences (as simTermTaxaAdvanced originally called the deprecated function simFossilTaxa). In particular, simTermTaxaAdvanced can be used to produce simulated datasets which have extant taxa.

trueTermTaxaTree is analogous to the function of taxa2phylo, in that it outputs the time-scaled-phylogeny for a terminal-taxon dataset for some times of observations. Unlike with the use of taxa2phylo on the output on simFossilRecord (via fossilRecord2fossilTaxa, there is no need to use trueTermTaxaTree to obtain the true phylogeny when times of extinction are the times of observation; just get the $tree element from the result output by simTermTaxa.

Also unlike with taxa2phylo, the cladistic topology of relationships among morphotaxa never changes as a function of time of observation. For obtaining the 'ideal cladogram' of relationships among the terminal taxa, merely take the $tree element of the output from simtermTaxaData and remove the branch lengths (see below for an example).

As with many functions in the paleotree library, absolute time is always decreasing, i.e. the present day is zero.

Value

deadTree gives a dated phylo object, with a $root.time element. As discussed above, the result is always an extinct phylogeny of exactly ntaxa.

simTermTaxa and simTermTaxaAdvanced both produce a list with two components: $taxonRanges which is a two-column matrix where each row gives the true first and last appearance of observable taxa and $tree which is a dated phylogeny with end-points at the true last appearance time of taxa.

trueTermTaxaTree produces a dated tree as a phylo object, which describes the relationships of populations at the times of observation given in the time.obs argument.

Author(s)

David W. Bapst

References

Paradis, E. (2012) Analysis of Phylogenetics and Evolution with R (Second Edition). New York: Springer.

See Also

deadtree is simply a wrapper of the function rtree in ape.

For a very different way of simulating diversification in the fossil record, see simFossilRecord, fossilRecord2fossilTaxa, taxa2phylo and taxa2cladogram.

Examples


set.seed(444)
# example for 20 taxa
termTaxaRes <- simTermTaxa(20)

# let look at the taxa...
taxa <- termTaxaRes$taxonRanges
taxicDivCont(taxa)
# because ancestors don't even exist as taxa
	# the true diversity curve can go to zero
	# kinda bizarre!

# the tree should give a better idea
tree <- termTaxaRes$tree
phyloDiv(tree)
# well, okay, its a tree. 

# get the 'ideal cladogram' ala taxa2cladogram
    # much easier with terminal-taxa simulations
    # as no paraphyletic taxa
cladogram <- tree
cladogram$edge.length <- NULL
plot(cladogram)

# trying out trueTermTaxaTree
# random times of observation: uniform distribution
time.obs <- apply(taxa,1,
    function(x) runif(1,x[2],x[1])
    )
tree1 <- trueTermTaxaTree(
    termTaxaRes,
    time.obs
    )
layout(1:2)
plot(tree)
plot(tree1)
layout(1)

########################################### 
# let's look at the change in the terminal branches
plot(tree$edge.length,
    tree1$edge.length)
# can see some edges are shorter on the new tree, cool

# let's now simulate sampling and use FADs
layout(1:2)
plot(tree)
axisPhylo()

FADs <- sampleRanges(
    termTaxaRes$taxonRanges,
    r = 0.1)[,1]
tree1 <- trueTermTaxaTree(termTaxaRes, FADs)

plot(tree1)
axisPhylo()

################################################
# can condition on sampling some average number of taxa
# analogous to deprecated function simFossilTaxa_SRcond
r <- 0.1
avgtaxa <- 50
sumRate <- 0.2

# avg number necc for an avg number sampled
ntaxa_orig <- avgtaxa / (r / (r + sumRate))	
termTaxaRes <- simTermTaxa(
    ntaxa = ntaxa_orig,
    sumRate = sumRate)

# note that conditioning must be conducted using full sumRate
# this is because durations are functions of both rates
# just like in bifurcation

# now, use advanced version of simTermTaxa: simTermTaxaAdvanced
    # allows for extant taxa in a term-taxa simulation

#with min.cond
termTaxaRes <- simTermTaxaAdvanced(
    p = 0.1,
    q = 0.1,
    mintaxa = 50,
    maxtaxa = 100,
    maxtime = 100,
    minExtant = 10,
    maxExtant = 20,
    min.cond = TRUE
    )
    
# notice that arguments are similar to simFossilRecord
	# and even more similar to deprecated function simFossilTaxa
	
plot(termTaxaRes$tree)
Ntip(termTaxaRes$tree)

# without min.cond
termTaxaRes <- simTermTaxaAdvanced(
    p = 0.1,
    q = 0.1,
    mintaxa = 50,
    maxtaxa = 100,
    maxtime = 100,
    minExtant = 10,
    maxExtant = 20,
    min.cond = FALSE
    )
    
plot(termTaxaRes$tree)
Ntip(termTaxaRes$tree)

layout(1)

paleotree documentation built on Aug. 22, 2022, 9:09 a.m.