# Simulating Extinct Clades of Monophyletic Taxa

### Description

This function simulates the diversification of clades composed of
monophyletic terminal taxa, which are distinguished in a fashion completely
alternative to way taxa are defined in the simulation functions
`simFossilRecord`

, `taxa2cladogram`

and `taxa2phylo`

.

### Usage

1 2 3 4 5 6 7 8 9 | ```
simTermTaxa(ntaxa, sumRate = 0.2)
simTermTaxaAdvanced(p = 0.1, q = 0.1, mintaxa = 1, maxtaxa = 1000,
mintime = 1, maxtime = 1000, minExtant = 0, maxExtant = NULL,
min.cond = TRUE)
trueTermTaxaTree(TermTaxaRes, time.obs)
deadTree(ntaxa, sumRate = 0.2)
``` |

### Arguments

`ntaxa` |
Number of monophyletic 'terminal' taxa (tip terminals) to be included on the simulated tree |

`sumRate` |
The sum of the instantaneous branching and extinction rates; see below. |

`p` |
Instantaneous rate of speciation/branching. |

`q` |
Instantaneous rate of extinction. |

`mintaxa` |
Minimum number of total taxa over the entire history of a clade necessary for a dataset to be accepted. |

`maxtaxa` |
Maximum number of total taxa over the entire history of a clade necessary for a dataset to be accepted. |

`mintime` |
Minimum time units to run any given simulation before stopping. |

`maxtime` |
Maximum time units to run any given simulation before stopping. |

`minExtant` |
Minimum number of living taxa allowed at end of simulations. |

`maxExtant` |
Maximum number of living taxa allowed at end of simulations. |

`min.cond` |
If TRUE, the default, simulations are stopped when they meet all minimum conditions. If FALSE, simulations will continue until they hit maximum conditions, but are only accepted as long as they still meet all minimum conditions in addition. |

`TermTaxaRes` |
The list output produced by simTermTaxa |

`time.obs` |
A per-taxon vector of times of observation for the taxa in TermTaxaRes |

### Details

`deadTree`

generates a time-scaled topology for an entirely extinct clade of a
specific number of tip taxa. Because the clade is extinct and assumed to
have gone extinct in the distant past, many details of typical birth-death
simulators can be ignored. If a generated clade is already conditioned upon
the (a) that some number of taxa was reached and (b) then the clade went
extinct, the topology (i.e. the distribution of branching and extinction
events) among the branches should be independent of the actual generating
rate. The frequency of nodes is a simple mathematical function of the number
of taxa (i.e. number of nodes is the number of taxa -1) and their placement
should completely random, given that we generally treat birth-death
processes as independent Poisson processes. Thus, in terms of generating the
topology, this function is nothing but a simple wrapper for the ape function
rtree, which randomly places splits among a set of taxa using a simple
algorithm (see Paradis, 2012). To match the expectation of a birth-death
process, new branch lengths are calculated as an exponential distribution
with mean 1/sumRate, where sumRate represents the sum of the branching and
extinction rates. Although as long as both the branching rate and extinction
rates are more than zero, any non-ultrametric tree is possible, only when
the two rates are non-zer and equal to each other will there be a high
chance of getting an extinct clade with many tips. Any analyses one could do
on a tree such as this will almost certainly give estimates of equal
branching and extinction rates, just because all taxa are extinct.

`simTermTaxa`

produces 'terminal-taxon' datasets; datasets of clades where the
set of distinguishable taxa are defined as intrinsically monophyletic. (In
version 1.6, I referred to this as the 'candle' mode, so named from the
'candling' horticultural practice and the visual conceptualization of the
model.) On theoretical terms, terminal-taxa datasets are what would occur if
(a) only descendant lineages can be sample and (b) all taxa are immediately
differentiated as of the last speciation event and continue to be so
differentiated until they go extinct. In practice, this means the taxa on
such a tree would represent a sample of all the terminal branches, which
start with some speciation event and end in an extinction event. These are
taken to be the true original ranges of these taxa. No further taxa can be
sampled than this set, whatsoever. Note that the differentiation here is a
result of a posteriori consideration of the phylogeny: one can't even know
what lineages could be sampled or the actual start points of such taxa until
after the entire phylogeny of a group of organisms is generated.

Because all evolutionary history prior to any branching events is unsampled, this model is somewhat agnostic about the general model of differentiation among lineages. The only thing that can be said is that synapomorphies are assumed to be potentially present along every single branch, such that in an ideal scenario every clade could be defined. This would suggest very high anagenesis or bifurcation.

Because the set of observable taxa is a limited subset of the true evolution history, the true taxon ranges are not a faithful reproduction of the true diversity curve. See an example below.

`simTermTaxa`

uses `deadTree`

to make a phylogeny, so the only datasets produced
are of extinct clades. `simTermTaxaAdvanced`

is an alternative to `simTermTaxa`

which uses `simFossilRecord`

to generate the underlying pattern of evolutionary
relationships and not `deadTree`

. The arguments are thus similar to
`simFossilRecord`

, with some differences (as `simTermTaxaAdvanced`

originally called the deprecated function `simFossilTaxa`

).
In particular, `simTermTaxaAdvanced`

can be used to produce
simulated datasets which have extant taxa.

`trueTermTaxaTree`

is analagous to the function of `taxa2phylo`

, in that it
outputs the time-scaled-phylogeny for a terminal-taxon dataset for some
times of observations. Unlike with the use of `taxa2phylo`

on the output on
`simFossilRecord`

(via `fossilRecord2fossilTaxa`

,
there is no need to use `trueTermTaxaTree`

to obtain the true
phylogeny when times of extinction are the times of observation; just get
the `$tree`

element from the result output by `simTermTaxa`

.

Also unlike with `taxa2phylo`

, the cladistic topology of relationships among
morphotaxa never changes as a function of time of observation. For obtaining
the 'ideal cladogram' of relationships among the terminal taxa, merely take
the $tree element of the output from `simtermTaxaData`

and remove the branch
lengths (see below for an example).

As with many functions in the paleotree library, absolute time is always decreasing, i.e. the present day is zero.

### Value

`deadTree`

gives time-scaled phylo object, with a $root.time element.
As discussed above, the result is always an extinct phylogeny of exactly
`ntaxa`

.

`simTermTaxa`

and `simTermTaxaAdvanced`

both produce a list with two components:
`$taxonRanges`

which is a two-column matrix where each row gives the true
first and last appearance of observable taxa and `$tree`

which is a
time-scaled phylogeny with end-points at the true last appearance time of
taxa.

`trueTermTaxaTree`

produces a time-scaled tree as a phylo object, which
describes the relationships of populations at the times of observation given
in the time.obs argument.

### Author(s)

David W. Bapst

### References

Paradis, E. (2012) *Analysis of Phylogenetics and Evolution
with R (Second Edition).* New York: Springer.

### See Also

deadtree is simply a wraper of the function `rtree`

in ape.

For a very different way of simulating diversification in the fossil record,
see `simFossilRecord`

, `fossilRecord2fossilTaxa`

,
`taxa2phylo`

and `taxa2cladogram`

.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | ```
set.seed(444)
#example for 20 taxa
termTaxaRes<-simTermTaxa(20)
#let look at the taxa...
taxa<-termTaxaRes$taxonRanges
taxicDivCont(taxa)
#because ancestors don't even exist as taxa
#the true diversity curve can go to zero
#kinda bizarre!
#the tree should give a better idea
tree<-termTaxaRes$tree
phyloDiv(tree)
#well, okay, its a tree.
#get the 'ideal cladogram' ala taxa2cladogram
#much easier with terminal-taxa simulations as no paraphyletic taxa
cladogram<-tree
cladogram$edge.length<-NULL
plot(cladogram)
#trying out trueTermTaxaTree
#random times of observation: uniform distribution
time.obs<-apply(taxa,1,function(x) runif(1,x[2],x[1]))
tree1<-trueTermTaxaTree(termTaxaRes,time.obs)
layout(1:2)
plot(tree)
plot(tree1)
layout(1)
#let's look at the change in the terminal branches
plot(tree$edge.length,tree1$edge.length)
#can see some edges are shorter on the new tree, cool
#let's now simulate sampling and use FADs
layout(1:2)
plot(tree);axisPhylo()
FADs<-sampleRanges(termTaxaRes$taxonRanges,r=0.1)[,1]
tree1<-trueTermTaxaTree(termTaxaRes,FADs)
plot(tree1);axisPhylo()
#can condition on sampling some average number of taxa
#analagous to deprecated function simFossilTaxa_SRcond
r<-0.1
avgtaxa<-50
sumRate<-0.2
#avg number necc for an avg number sampled
ntaxa_orig<-avgtaxa/(r/(r+sumRate))
termTaxaRes<-simTermTaxa(ntaxa=ntaxa_orig,sumRate=sumRate)
#note that conditioning must be conducted using full sumRate
#this is because durations are functions of both rates
#just like in bifurcation
#use advanced version of simTermTaxa: simTermTaxaAdvanced
#allows for extant taxa in a term-taxa simulation
#with min.cond
termTaxaRes<-simTermTaxaAdvanced(p=0.1,q=0.1,mintaxa=50,
maxtaxa=100,maxtime=100,minExtant=10,maxExtant=20,min.cond=TRUE)
#notice that arguments are similar to simFossilRecord
# and somewhat more similar to deprecated function simFossilTaxa ;P
plot(termTaxaRes$tree)
Ntip(termTaxaRes$tree)
#without min.cond
termTaxaRes<-simTermTaxaAdvanced(p=0.1,q=0.1,mintaxa=50,
maxtaxa=100,maxtime=100,minExtant=10,maxExtant=20,min.cond=FALSE)
plot(termTaxaRes$tree)
Ntip(termTaxaRes$tree)
layout(1)
``` |