R/addTaxa.R

Defines functions addTaxa

Documented in addTaxa

#' Bind species to hypothesized relatives in a phylogeny
#'
#' Given a data frame of species and taxonomic assignments, and an accepted phylogeny with
#' some of those species in it, will add the missing species in next to a taxonomic 
#' relative. Optionally, can keep track of which named clades species are bound to.
#'
#' @param tree An ape-style phylogenetic tree.
#' @param groupings A data frame with two columns, "species" and "group". Missing species,
#' to be added, are taken as those that do not match a value in the tip labels of tree.
#' @param branch.position
#' Once the algorithm has selected to which node a species will
#' be bound, this determines where on the branch that happens. Note that this refers to
#' the position below (stemwards) from the node the tip will be bound.Currently there are
#' four options. The first, "polytomy", creates a polytomy at the node to which the new
#' tip is added (need to check but it might actually make a dichotomous branch of length 0).
#' The second, "midpoint", simply splits the difference between the node and its parent
#' and binds the new tip in there.
#' The third, "uniform", will sample from a uniform distribution with a minimum of
#' zero and a maximum of the full distance between the node and its parent. Note that
#' this means that species can conceivably be added at the maximum or minimum
#' distance possible from the tip, which would create a polytomy (or branch of length 0?).
#' I have removed checks to account and deal with this--if you need them back let me know. 
#' The fourth option is
#' "bd." This uses the corsim function from the TreeSim package to simulate the missing
#' speciation events according to speciation (lambda) and extinction (mu) values calculated
#' internally by addTaxa using diversitree.
#' @param ini.lambda Initial speciation value for the "bd" optimization, if that option
#' of branch position is chosen. Defaults to 1.
#' @param ini.mu Initial extinction value for the "bd" optimization, if that option
#' of branch position is chosen. Defaults to 0.1.
#' @param rate.estimate Whether to use 'laser', 'ape', or 'diversitree' to
#' calculate diversification rates. The latter is the only of those that can account for
#' missing taxa, but in my experience returns systematically biased values.
#' @param bd.type Whether to use a 'local' or a 'global' birth-death estimation. The former
#' is slower but tends to perform better, particularly if there are shifts in diversification
#' rates in the tree. The latter runs faster.
#' @param local.cutoff If using the local form of the bd method, this argument specifies
#' at what point the local neighborhood is deemed sufficiently large to estimate a local
#' diversification rate. For example, if this argument is set to 10, a local estimate is
#' not derived until the species being added is being bound into a monophyletic clade of
#' at least 10 tips belonging to that species' group. The method seems to perform better
#' when this cutoff is set low, but smaller numbers increase run time. The cutoff must be
#' 3 or higher, as the birth-death estimation will always fail for two taxa. It can also
#' fail or timeout (see below) for larger cutoffs--when it does, it uses the midpoint
#' branch.position method and moves onto the next taxon.
#' @param time.out The amount of time to sample in search of a new branch position that
#' accords with the local birth death estimation and that falls within the age of the
#' branch to which the new species is being bound. Larger values may increase accuracy,
#' but definitely increase run time. Default is 2 seconds. 
#' @param no.trees The number of desired final trees with all missing species from 
#' groupings added.
#' @param clade.membership An optional data frame with first column = "species", second
#' column = "clade". These are named, monophyletic clades to which the species in the
#' input phylogeny can belong. Not every species in the input phylogeny needs to be in the
#' clade membership data frame, but missing species added to species not in this frame
#' will not be included in the output clade membership data frame. Such species will
#' still be included in the output phylogenies. Note that these clades need to be
#' mutually exclusive. For example, clade 2 cannot be contained within clade 1.
#' @param crown.can.move Logical. If TRUE, and if missing taxa are to be added stemwards,
#' this will allow the age of the crown group to potentially shift back in time. If FALSE,
#' and if missing taxa were to be added stemwards, this will prevent taxa from being added
#' below the crown group. It will force these to be bound crownwards if the crown node
#' in the clade is selected to be bound to. Note that the argument
#' name is somewhat misleading in that the crown ages can still shift when missing taxa
#' are added. Specifically, if missing taxa are bound to what were initially
#' single-species clades in the input tree, then the
#' crown age will "shift" forward in time (the single-taxon clade did not actually have a 
#' crown age). If crown.can.move is set to FALSE, then after one taxon is added to such a
#' single-species clade, the crown age of that clade then becomes fixed and will not move.
#' 
#' @details Given a data frame of two columns, which *must* be named "species" and
#' "group", will take a species
#' that is absent from the phylogeny and bind it to a randomly selected taxonomic 
#' relative. The algorithm works as follows. First, species are identified that are in
#' the groupings data frame but are not present in the tree. The order of these missing
#' species is then randomized. One of these missing species (A) is selected, and a
#' species (B) from the tree that is in that species' group is identified. If B is the
#' only species in the tree in that group, A is bound to B at a distance below the tip
#' determined by the branch.position argument. If the group of A+B contains additional
#' species in the tree, the function then checks whether those species are monophyletic.
#' If so, the function identifies all possible valid positions within the group to which
#' A could be added. The root of the phylogeny and, if crown.can.move is set to FALSE,
#' the crown node of the group, are excluded from consideration. If the species group is
#' not monophyletic, the function bumps one node down (stemwards) in the tree towards the
#' root and checks whether the species that descend from it all belong to the same group
#' as B. The function continues this process until it finds a node that leads to species
#' in multiple species groups, or hits the root. Then, all possible positions upstream
#' (crownwards) from the deepest node encountered are tabulated, one is randomly selected,
#' and A is bound accordingly. This process is repeated iteratively until the tree contains
#' all species in the groupings data frame. There are a four options for how far below A
#' will be added to whichever node is ultimately selected for binding. These are:
#' polytomy, midpoint, uniform, and birth-death model, described in the branch.position
#' argument above. Additionally, the function can take a clade membership data frame,
#' which must contain the columns "species" and "clade". When missing species are added
#' into these clades, the data frame is updated accordingly, which facilitates
#' calculations of the sensitivity of diversification rate later.
#'
#' @return A list with two elements: (1) multiPhylo object with number of trees as 
#' determined by no.trees, and, if clade.membership was provided, (2) a list of data 
#' frames summarizing which named clade each species in the complete phylogeny belongs to
#' (if any).
#'
#' @export
#'
#' @importFrom stats runif
#' @importFrom phytools bind.tip
#' @importFrom ape is.binary.tree is.monophyletic getMRCA branching.times
#' @importFrom paleotree timeSliceTree
#' @importFrom R.utils withTimeout
#'
#' @references Mast et al. 2015. Paraphyly changes understanding of timing and tempo of 
#' diversification in subtribe Hakeinae (Proteaceae), a giant Australian plant radiation.
#' American Journal of Botany.
#'
#' @examples
#' data(chelonia)
#' tree <- chelonia$phy
#'
#' #some species in this tree are identified to subspecies. drop those
#' temp <- lapply(strsplit(tree$tip.label, "_"), length)
#' names(temp) <- tree$tip.label
#' temp <- temp[temp==2]
#' tree <- drop.tip(tree, setdiff(tree$tip.label, names(temp)))
#'
#' #create an example groupings data frame.
#' groupsDF <- data.frame(species=tree$tip.label)
#' groupsDF$group <- unlist(lapply(strsplit(tree$tip.label, "_"), "[", 1))
#'
#' #use the function to drop 100 species (there were 194 in tree)
#' example <- tipDropper(tree, groupsDF, 100)
#'
#' #add those missing species back in
#' newTrees <- addTaxa(tree=example, groupings=groupsDF, branch.position="bd",
#'   rate.estimate="ape", bd.type="global", no.trees=1)

addTaxa <- function(tree, groupings, branch.position="midpoint", 
	ini.lambda=1, ini.mu=0, rate.estimate, bd.type, local.cutoff, time.out,
	no.trees, clade.membership, crown.can.move=TRUE)
{
	#set up a blank list and set aside the orig tree and DF to reload below
	trees <- list()
	clade.tables <- list()
	midpoint.additions <- list()
	origTree <- tree
	if(!missing(clade.membership))
	{
		origMembership <- clade.membership
	}

	#not sure what would happen if provided with a polytomous tree, so throw an 
	#error here
	if(is.binary.tree(tree) == FALSE)
	{
		stop("Input tree is not binary. Cannot currently account for this.")
	}

	#the function definitely fails with a weird error if the tree is not
	#ultrametric. check and stop if so
	if(is.ultrametric(tree) == FALSE)
	{
		stop("Input tree is not ultrametric.")
	}

	#force groupings DF to character if it isn't. do same for clade.membership
	#if it is provided
	groupings[,1] <- as.character(groupings[,1])
	groupings[,2] <- as.character(groupings[,2])

	if(!missing(clade.membership))
	{
		clade.membership[,1] <- as.character(clade.membership[,1])
		clade.membership[,2] <- as.character(clade.membership[,2])

		#insert a check here to ensure all the named clades are actually monophyletic
		#split the clade.membership table on clade. if any are not monophyletic, stop
		splitUp <- split(clade.membership, clade.membership$clade)
		mono <- unlist(lapply(splitUp, function(x) is.monophyletic(tree, x$species)))
		if(!all(mono))
		{
			stop("All named clades in clade.membership table are not monophyletic")
		}

		#insert another check to make sure the clades are not nested
		if(length(clade.membership$species) != length(unique(clade.membership$species)))
		{
			stop("The clades are not mutually exclusive")
		}

		#insert another check to make sure all spp in clade.membership are in the
		#tree. you get really confusing error messages if they are not
		if(length(setdiff(clade.membership$species, tree$tip.label)) > 0)
		{
			stop("clade.membership contains species that are not in tree")
		}
	}

	#use identify missing to figure out which taxa are not in the input tree
	toAdd <- identifyMissing(tree, groupings)

	#shuffle the toAdd table so that each new tree more efficiently explores
	#possible tree space. 
	toAdd <- toAdd[order(sample(toAdd$species)),]

	#add a check to ensure that all missing species are included in a taxonomic
	#group that
	#is included in the input tree
	inTree <- groupings[!(groupings$species %in% toAdd$species),]
	inTree <- unique(inTree$group)

	if(length(setdiff(unique(toAdd$group), inTree)) > 0)
	{
		stop("All missing species must be part of a taxonomic group in the input tree")
	}

	#use the findRates function to estimate speciation and extinction rates for the
	#original tree.
	if(branch.position=="bd")
	{
		rates <- findRates(origTree,
			prop.complete=length(origTree$tip.label)/dim(groupings)[1],
			ini.lambda=ini.lambda, ini.mu=ini.mu, rate.estimate=rate.estimate)
	}

	#begin outer loop where you aggregate complete trees
	for(i in 1:no.trees)
	{
		#set up a placeholder to note if you were forced to use midpoint for any taxa
		#that you intended to use bd for
		placeholder <- 0

		#begin inner loop where you build up individual trees
		for(j in 1:dim(toAdd)[1])
		{
			#if the clade.membership table is provided, and if crown.can.move is FALSE,
			#find the MRCA of all the named clades. continually update this so that as
			#species are added to named clades, these nodes change also. add the root
			#node to it, to simplify some things below
			if(!missing(clade.membership) & crown.can.move==FALSE)
			{
				problemNodes <- unlist(getMRCAs(tree, clade.membership))
				rootNode <- length(tree$tip.label) + 1
				names(rootNode) <- "root"
				problemNodes <- c(problemNodes, rootNode)
			}

			#if those conditions are not true, just set problemNodes to the root
			else
			{
				problemNodes <- length(tree$tip.label) + 1
				names(problemNodes) <- "root"
			}

			#find all tips in the species group that tip j belongs to
			relatedSpp <- groupings$species[groupings$group == toAdd$group[j]]

			#this included species that potentially are not yet in the tree. drop any
			#species that are not yet in the tree
			relatedSpp <- relatedSpp[relatedSpp %in% tree$tip.label]

			#if relatedSpp is of length 1, just bind directly to it and bounce down to
			#bottom to determine binding distance. a few important things to notice
			#here. first, getMRCAs will have returned the MRCA of any single-taxon
			#clades as the tip node of the taxon. this would mean that that node gets
			#excluded as a problem node and cannot be bound to. however, we do not
			#assess problem nodes in this step. the steps below where we do should not
			#be relevant because they shouldn't be single taxon clades. if on one
			#iteration we bind a species to a single species taxon, and if 
			#crown.can.move is set to FALSE, the new MRCA of the clade is going to
			#become the crown and THEN it can no longer move.
			if(length(relatedSpp) == 1)
			{
				#identify the node that represents that species
				bindTo <- which(tree$tip.label==relatedSpp)
			}

			#if there are > 1 relatedSpp, figure out whether it is a monophyletic group
			else if(is.monophyletic(tree, relatedSpp))
			{
				#if true, find the MRCA of the clade of related spp
				crownNode <- getMRCA(tree, relatedSpp)

				#now find all nodes that descend from the crownNode. previously, you were
				#only returning tips if branch.position was set to polytomy, but actually
				#as long as the species' group is monophyletic it's ok to add new species
				#into that group in polytomies with internal nodes. the tips=FALSE arg
				#returns tips and internal nodes; tips=TRUE just returns tips
				dNodes <- geiger:::.get.descendants.of.node(node=crownNode,
					phy=tree, tips=FALSE)
				
				#confirm that crownNode is not a part of the set of problem nodes
				#defined above. if it is, only consider nodes that descend from it.
				if(length(problemNodes[problemNodes == crownNode])==1)
				{
					allNodes <- dNodes
				}
				
				#otherwise add the crown node to consideration as a place to bind the tip
				else
				{
					#importantly, note that you do not have to figure out who the
					#parent node of crownNode is, because bind.tip's position arg
					#will, if given a positive value, bind stemward from crownNode
					allNodes <- c(crownNode, dNodes)
				}

				#REALLY REALLY IMPORTANT. sample produces really odd results if sampling
				#from a numeric vector of length 1. need to account for that with this
				#overcomplicated if else statement. then randomly sample allNodes for 1
				if(length(allNodes)==1)
				{
					bindTo <- allNodes
				}
				else
				{
					bindTo <- sample(allNodes, 1)
				}
			}

			#if relatedSpp are not monophyletic, go into a more costly procedure to
			#figure out where you can bind the tip in.
			else
			{
				#first randomly sample a species from relatedSpp
				bindingToSp <- sample(relatedSpp, 1)

				#identify the node that represents that species
				bindingTo <- which(tree$tip.label==bindingToSp)

				#identify the node that subtends the node in question. at this point
				#you shouldn't have to worry about going "below" the root, since there
				#will always be one node stemwards from a tip. after that we check to
				#make sure we don't go past the root
				parent <- tree$edge[,1][tree$edge[,2]==bindingTo]

				#go into a while loop where keep bumping down the tree until you hit
				#a node that doesn't lead to a set of monophyletic ancestors
				#according to the species groups.
				keepGoing <- TRUE

				while(keepGoing)
				{
					#check whether the parent node is part of problem nodes. if it is,
					#set keepGoing to FALSE so it will just bind to the bindingTo sp
					if(length(problemNodes[problemNodes == bindingTo])==1)
					{
						keepGoing <- FALSE
					}

					#figure out which tip nodes descend from this parent node
					toExamineNodes <- geiger:::.get.descendants.of.node(node=parent,
						phy=tree, tips=TRUE)

					#figure out which species these nodes represent
					toExamine <- tree$tip.label[toExamineNodes]

					#figure out whether all these species belong to the same group.
					#this will pull a vector of the group to which each species belongs
					groupToExamine <- groupings$group[groupings$species %in% toExamine]

					#if there is one unique element in this vector, then the node is
					#monophyletic. you do not need to check if it is the root, did so above
					if(length(unique(groupToExamine)) == 1)
					{
						#identify the set of nodes that descend from parent
						dNodes <- geiger:::.get.descendants.of.node(node=parent,
							phy=tree, tips=FALSE)
						allNodes <- c(parent, dNodes)

						#set bindingTo to be parent, then set parent to be the node one
						#below what it was
						bindingTo <- parent
						parent <- tree$edge[,1][tree$edge[,2]==parent]
					}

					#if the vector is not monophyletic, no need to keepGoing
					else
					{
						keepGoing <- FALSE
					}
				}

				#hopefully the previous while loop bumped the bindingTo node up to whatever
				#the most stemwards monophyletic node was encompassing the randomly sampled
				#species. if bindingTo is a species, then there are no descendants, and need
				#to bind directly to that tip. otherwise, identify the nodes that descend
				#from bindingTo and randomly sample one to actually bind to.		
				if(bindingTo <= length(tree$tip.label))
				{
					allNodes <- bindingTo
				}

				else
				{
					#identify descendants
					dNodes <- geiger:::.get.descendants.of.node(node=bindingTo,
						phy=tree, tips=FALSE)

					#at this point there should be no way that bindingTo is
					#the root, so add it to all descendant nodes for sampling from below
					allNodes <- c(bindingTo, dNodes)
				}		

				#REALLY REALLY IMPORTANT. sample produces really odd results if sampling
				#from a numeric vector of length 1. need to account for that with this
				#overcomplicated if else statement. then randomly sample allNodes for 1
				if(length(allNodes)==1)
				{
					bindTo <- allNodes
				}
				else
				{
					bindTo <- sample(allNodes, 1)
				}
			}

			#calculate position based on the chosen method
			if(branch.position=="polytomy")
			{
				bindDist <- 0
			}

			else if(branch.position=="midpoint")
			{
				#identify the node that subtends the selected node to bind to
				parent <- tree$edge[,1][tree$edge[,2]==bindTo]

				#find the distance between these two nodes. first
				#set up a temporary matrix and give it row names. this allows you to pull out
				#the index of the edge in question, and subset the edge.lengths based on that
				#index, to get needed branch lengths later
				tempMatrix <- tree$edge
				rownames(tempMatrix) <- 1:dim(tempMatrix)[1]

				#use the matrix to get the index needed below
				nodeIndex <- rownames(tempMatrix)[tempMatrix[,1]==parent 
					& tempMatrix[,2]==bindTo]
				nodeIndex <- as.numeric(nodeIndex)

				#define the distance to bind as half distance to the parent node
				bindDist <- tree$edge.length[nodeIndex]/2
			}

			else if(branch.position=="uniform")
			{
				#identify the node that subtends the selected node to bind to
				parent <- tree$edge[,1][tree$edge[,2]==bindTo]

				#find the distance between these two nodes. first
				#set up a temporary matrix and give it row names. this allows you to pull out
				#the index of the edge in question, and subset the edge.lengths based on that
				#index, to get needed branch lengths later
				tempMatrix <- tree$edge
				rownames(tempMatrix) <- 1:dim(tempMatrix)[1]

				#use the matrix to get the index needed below
				nodeIndex <- rownames(tempMatrix)[tempMatrix[,1]==parent 
					& tempMatrix[,2]==bindTo]
				nodeIndex <- as.numeric(nodeIndex)

				#define bindDist as a uniform value between 0 and max. could create polytomies
				#with code like this, in theory. used to add an offset here to account for that,
				#but seems like over thinking it. add later if necessary
				bindDist <- stats::runif(n=1, min=0, max=tree$edge.length[nodeIndex])
			}

			else if(branch.position=="bd")
			{
				#identify the node that subtends the selected node to bind to
				parent <- tree$edge[,1][tree$edge[,2]==bindTo]

				#find the ages of these nodes. if bindTo is a tip, then its age is 0. create
				#vector of branching times because you need it either way for age of parent
				ages <- branching.times(tree)

				if(bindTo <= length(tree$tip.label))
				{
					bindToAge <- 0
				}

				else
				{
					bindToAge <- ages[names(ages)==bindTo]
				}

				parentAge <- ages[names(ages)==parent]

				#prune the tree at the parent node
				pruned <- extract.clade(tree, parent)

				#slice the tree at the bindToAge. if this is 0 (i.e. bindTo was a tip), the
				#tree is returned unmodified. if it's not a tip, then a polytomy with some
				#weird numerical precision issues can be left at the tip. so, add a slight
				#constant here to prune back just past the node. this is tree-scale-invariant,
				#so not ideal, but it's a very small value. 
				#paleotree can occasionally throw an error that has to do with slice time being
				#later than tips. not sure what this is about. if it does this, use the global
				#birth death approach
				#adding the root.time piece is to get paleotree to not throw a warning
				pruned$root.time = max(TreeSim::getx(pruned))
				sliced <- try(paleotree::timeSliceTree(ttree=pruned,
					sliceTime=bindToAge+10^-7, plot=FALSE), silent=TRUE)

				#calculate the age of the missing speciation event. this can pose problems when
				#it returns as 'numeric(0)', so check that below, and use the midpoint approach
				#if it does. it's nice to also wrap the bdScaler in a withTimeout function so
				#that if the while loop within the function is taking too long, it'll bump to the
				#midpoint method. if bd.type is local and sliced contains local.cutoff or more
				#species, recalculate a local birth-death rate. otherwise use the global rate. 
				if(bd.type=="local" & class(sliced)!="try-error")
				{
					if(length(sliced$tip.label) >= local.cutoff)
					{
						#wrap this up into a try statement, as sometimes, somehow, non-ultrametric
						#trees are being created
						newRates <- try(findRates(sliced,
							prop.complete=(length(sliced$tip.label)-1)/length(sliced$tip.label),
							ini.lambda=ini.lambda, ini.mu=ini.mu, rate.estimate=rate.estimate),
							silent=TRUE)
						
						#if it was not able to calculate a new rate, use the global rate on the
						#sliced tree
						if(class(newRates)=="try-error")
						{
							missingAge <- try(R.utils::withTimeout(bdScaler(tree=sliced,
								lambda=rates["lambda"], mu=rates["mu"],
								min.age=0, max.age=0), timeout=time.out, onTimeout="error"),
								silent=TRUE)
						}
						
						#otherwise use the new rate on the sliced tree
						else
						{
							missingAge <- try(R.utils::withTimeout(bdScaler(tree=sliced,
								lambda=newRates["lambda"], mu=newRates["mu"],
								min.age=0, max.age=0), timeout=time.out, onTimeout="error"),
								silent=TRUE)
						}
					}

					else
					{
						missingAge <- try(R.utils::withTimeout(bdScaler(tree=sliced,
							lambda=rates["lambda"], mu=rates["mu"],
							min.age=0, max.age=0), timeout=time.out, onTimeout="error"),
							silent=TRUE)
					}
				}

				#if bd.type is global or if paleotree encountered an error, use global rate
				#on whole tree and force branch to be within the correct time period
				else if(bd.type=="global" | class(sliced)=="try-error")
				{
					missingAge <- try(R.utils::withTimeout(bdScaler(tree, lambda=rates["lambda"],
						mu=rates["mu"], min.age=bindToAge, max.age=parentAge), timeout=time.out,
						onTimeout="error"), silent=TRUE)
				}

				else
				{
					stop("if branch.position is set to 'bd', bd.type must be set to either 'local' or 'global'")
				}

				#if any of that failed use the midpoint method
				if(class(missingAge)=="try-error" | length(missingAge) == 0)
				{
					#identify the node that subtends the selected node to bind to
					parent <- tree$edge[,1][tree$edge[,2]==bindTo]

					#find the distance between these two nodes. first
					#set up a temporary matrix and give it row names. this allows you to pull out
					#the index of the edge in question, and subset the edge.lengths based on that
					#index, to get needed branch lengths later
					tempMatrix <- tree$edge
					rownames(tempMatrix) <- 1:dim(tempMatrix)[1]

					#use the matrix to get the index needed below
					nodeIndex <- rownames(tempMatrix)[tempMatrix[,1]==parent 
						& tempMatrix[,2]==bindTo]
					nodeIndex <- as.numeric(nodeIndex)

					#define the distance to bind as half distance to the parent node
					bindDist <- tree$edge.length[nodeIndex]/2

					#bump placeholder up if you had to do an addition w midpoint
					placeholder <- placeholder + 1

					warning("used 'midpoint' branch.position argument for addition of a taxon; corsim was not able to generate an appropriate split before function timeout")
				}

				#if missingAge is not a try-error, it should be class numeric. find
				#the distance below bindTo to bind tip in.
				else
				{
					#when pruning and slicing trees, do not subtract bindToAge or you'll get negative
					#branches! if you do use the global method, then subtract bindToAge. is it possible
					#you want to add the small constant you added to slice tree back here...?
					if(bd.type=="local")
					{
						bindDist <- missingAge
					}
					
					else
					{
						bindDist <- missingAge - bindToAge
					}

					#add a check here that if bindDist is below the parent node
					#it sets the age to the parent node. you would think this shouldn't be
					#possible, but TreeSim seems willing to return some weird things, so also
					#include another check that if bindDist is a negative number, it goes to 0.
					if(bindDist > parentAge-bindToAge)
					{
						bindDist <- parentAge-bindToAge
					}

					if(bindDist < 0)
					{
						bindDist <- 0
					}
				}
			}
			else
			{
				stop("branch.position must be set to one of 'polytomy', 'midpoint', 'uniform', or 'bd'")
			}

			#if clade membership table was provided, figure out which clade the bindTo
			#node belongs to, if any, and update the table accordingly
			if(!missing(clade.membership))
			{
				#this seems like it could be costly, but I can't think of a better
				#way to do it. for every
				#named clade, find the MRCA, then find all nodes that descend from
				#each of those. then figure out if bindTo is represented in any of
				#those sets. find the MRCAs of all named clades
				mrcas <- unlist(getMRCAs(tree, clade.membership))

				#identify all descendants. notice that you have getMRCAs
				#programmed so that it returns the tip of single taxon clades.
				#however, there is no such thing as the descendants of a tip.
				#deal with that below. one good thing is that if the clades
				#are monophyletic and non-overlapping, which you checked for at start
				#there is no way two clades can share a descendant.
				allDesc <- lapply(mrcas, function(x)
					geiger:::.get.descendants.of.node(node=x, phy=tree, tips=FALSE))

				#if any of these named clades return null for descendants, it
				#should mean that they are a single-taxon clade. replace those
				#descendants with the single tips
				allDesc[unlist(lapply(allDesc, is.null))] <- mrcas[unlist(lapply(allDesc, is.null))]

				#subset those descendants to any (should only be one) that match
				anyMatches <- lapply(allDesc, function(x) x[x==bindTo])

				#subset the names of the clades to the hopefully single node that 
				#represents the crown node of a clade and is considered an ancestor
				#of bindTo
				clade <- names(unlist(anyMatches))

				#if the length of clade is 0, then the node we are binding the tip to
				#was not in a named clade. in that case, do not bind it into the DF,
				#as do not need to keep track of it
				if(length(clade) == 1)
				{
					toBind <- data.frame(species=toAdd[j,1], clade=clade)
					clade.membership <- rbind(clade.membership, toBind)
				}
			}

			#finally, bind the new tip in
			tree <- phytools::bind.tip(tree=tree, tip.label=toAdd[j,1], where=bindTo,
				position=bindDist)
		}

		#if branch.position was bd, and the tree has polytomies, throw a warning
		if(branch.position=="bd" & !ape::is.binary.tree(tree))
		{
			warning("used multi2di to calculate diversification rates; polytomies present in tree")
		}

		#save the complete tree as an element in the list object. do same for
		#clade membership if it exists
		trees[[i]] <- tree
	
		if(!missing(clade.membership))
		{
			clade.tables[[i]] <- clade.membership
		}
		
		else
		{
			clade.tables[[i]] <- "clade.membership table not provided"
		}

		if(!missing(midpoint.additions))
		{
			midpoint.additions[[i]] <- placeholder
		}
		
		else
		{
			midpoint.additions[[i]] <- "did not use bd method"
		}

		
		#set everything back to original
		tree <- origTree

		if(!missing(clade.membership))
		{
			clade.membership <- origMembership
		}
	}
	
	#set the class of trees to multiPhylo, bind with clade.tables and return
	class(trees) <- "multiPhylo"
	results <- list(trees=trees, clade.tables=clade.tables, midpoint.additions=midpoint.additions)
	results
}
eliotmiller/addTaxa documentation built on Feb. 3, 2024, 12:19 a.m.