addTaxa: Bind species to hypothesized relatives in a phylogeny
In eliotmiller/addTaxa: An R package for adding missing taxa to phylogenies

addTaxa

R Documentation

Bind species to hypothesized relatives in a phylogeny

Description

Given a data frame of species and taxonomic assignments, and an accepted phylogeny with some of those species in it, will add the missing species in next to a taxonomic relative. Optionally, can keep track of which named clades species are bound to.

Usage

addTaxa(
  tree,
  groupings,
  branch.position = "midpoint",
  ini.lambda = 1,
  ini.mu = 0,
  rate.estimate,
  bd.type,
  local.cutoff,
  time.out,
  no.trees,
  clade.membership,
  crown.can.move = TRUE
)

Arguments

`tree`	An ape-style phylogenetic tree.
`groupings`	A data frame with two columns, "species" and "group". Missing species, to be added, are taken as those that do not match a value in the tip labels of tree.
`branch.position`	Once the algorithm has selected to which node a species will be bound, this determines where on the branch that happens. Note that this refers to the position below (stemwards) from the node the tip will be bound.Currently there are four options. The first, "polytomy", creates a polytomy at the node to which the new tip is added (need to check but it might actually make a dichotomous branch of length 0). The second, "midpoint", simply splits the difference between the node and its parent and binds the new tip in there. The third, "uniform", will sample from a uniform distribution with a minimum of zero and a maximum of the full distance between the node and its parent. Note that this means that species can conceivably be added at the maximum or minimum distance possible from the tip, which would create a polytomy (or branch of length 0?). I have removed checks to account and deal with this–if you need them back let me know. The fourth option is "bd." This uses the corsim function from the TreeSim package to simulate the missing speciation events according to speciation (lambda) and extinction (mu) values calculated internally by addTaxa using diversitree.
`ini.lambda`	Initial speciation value for the "bd" optimization, if that option of branch position is chosen. Defaults to 1.
`ini.mu`	Initial extinction value for the "bd" optimization, if that option of branch position is chosen. Defaults to 0.1.
`rate.estimate`	Whether to use 'laser', 'ape', or 'diversitree' to calculate diversification rates. The latter is the only of those that can account for missing taxa, but in my experience returns systematically biased values.
`bd.type`	Whether to use a 'local' or a 'global' birth-death estimation. The former is slower but tends to perform better, particularly if there are shifts in diversification rates in the tree. The latter runs faster.
`local.cutoff`	If using the local form of the bd method, this argument specifies at what point the local neighborhood is deemed sufficiently large to estimate a local diversification rate. For example, if this argument is set to 10, a local estimate is not derived until the species being added is being bound into a monophyletic clade of at least 10 tips belonging to that species' group. The method seems to perform better when this cutoff is set low, but smaller numbers increase run time. The cutoff must be 3 or higher, as the birth-death estimation will always fail for two taxa. It can also fail or timeout (see below) for larger cutoffs–when it does, it uses the midpoint branch.position method and moves onto the next taxon.
`time.out`	The amount of time to sample in search of a new branch position that accords with the local birth death estimation and that falls within the age of the branch to which the new species is being bound. Larger values may increase accuracy, but definitely increase run time. Default is 2 seconds.
`no.trees`	The number of desired final trees with all missing species from groupings added.
`clade.membership`	An optional data frame with first column = "species", second column = "clade". These are named, monophyletic clades to which the species in the input phylogeny can belong. Not every species in the input phylogeny needs to be in the clade membership data frame, but missing species added to species not in this frame will not be included in the output clade membership data frame. Such species will still be included in the output phylogenies. Note that these clades need to be mutually exclusive. For example, clade 2 cannot be contained within clade 1.
`crown.can.move`	Logical. If TRUE, and if missing taxa are to be added stemwards, this will allow the age of the crown group to potentially shift back in time. If FALSE, and if missing taxa were to be added stemwards, this will prevent taxa from being added below the crown group. It will force these to be bound crownwards if the crown node in the clade is selected to be bound to. Note that the argument name is somewhat misleading in that the crown ages can still shift when missing taxa are added. Specifically, if missing taxa are bound to what were initially single-species clades in the input tree, then the crown age will "shift" forward in time (the single-taxon clade did not actually have a crown age). If crown.can.move is set to FALSE, then after one taxon is added to such a single-species clade, the crown age of that clade then becomes fixed and will not move.

Details

Given a data frame of two columns, which *must* be named "species" and "group", will take a species that is absent from the phylogeny and bind it to a randomly selected taxonomic relative. The algorithm works as follows. First, species are identified that are in the groupings data frame but are not present in the tree. The order of these missing species is then randomized. One of these missing species (A) is selected, and a species (B) from the tree that is in that species' group is identified. If B is the only species in the tree in that group, A is bound to B at a distance below the tip determined by the branch.position argument. If the group of A+B contains additional species in the tree, the function then checks whether those species are monophyletic. If so, the function identifies all possible valid positions within the group to which A could be added. The root of the phylogeny and, if crown.can.move is set to FALSE, the crown node of the group, are excluded from consideration. If the species group is not monophyletic, the function bumps one node down (stemwards) in the tree towards the root and checks whether the species that descend from it all belong to the same group as B. The function continues this process until it finds a node that leads to species in multiple species groups, or hits the root. Then, all possible positions upstream (crownwards) from the deepest node encountered are tabulated, one is randomly selected, and A is bound accordingly. This process is repeated iteratively until the tree contains all species in the groupings data frame. There are a four options for how far below A will be added to whichever node is ultimately selected for binding. These are: polytomy, midpoint, uniform, and birth-death model, described in the branch.position argument above. Additionally, the function can take a clade membership data frame, which must contain the columns "species" and "clade". When missing species are added into these clades, the data frame is updated accordingly, which facilitates calculations of the sensitivity of diversification rate later.

Value

A list with two elements: (1) multiPhylo object with number of trees as determined by no.trees, and, if clade.membership was provided, (2) a list of data frames summarizing which named clade each species in the complete phylogeny belongs to (if any).

References

Mast et al. 2015. Paraphyly changes understanding of timing and tempo of diversification in subtribe Hakeinae (Proteaceae), a giant Australian plant radiation. American Journal of Botany.

Examples

data(chelonia)
tree <- chelonia$phy

#some species in this tree are identified to subspecies. drop those
temp <- lapply(strsplit(tree$tip.label, "_"), length)
names(temp) <- tree$tip.label
temp <- temp[temp==2]
tree <- drop.tip(tree, setdiff(tree$tip.label, names(temp)))

#create an example groupings data frame.
groupsDF <- data.frame(species=tree$tip.label)
groupsDF$group <- unlist(lapply(strsplit(tree$tip.label, "_"), "[", 1))

#use the function to drop 100 species (there were 194 in tree)
example <- tipDropper(tree, groupsDF, 100)

#add those missing species back in
newTrees <- addTaxa(tree=example, groupings=groupsDF, branch.position="bd",
  rate.estimate="ape", bd.type="global", no.trees=1)

eliotmiller/addTaxa documentation built on Sept. 29, 2024, 8:33 p.m.