R/traitDependentBAMM.R
In BAMMtools: Analysis and Visualization of Macroevolutionary Dynamics on Phylogenetic Trees

Documented in traitDependentBAMM

# Arguments:
#	ephy: a bammdata object
# rate: either "speciation", "extinction" or "net diversification". default to "speciation".
#	traits: a vector of trait data, with names corresponding to tips in the ephy object
#	reps: number of permutations to do
#	return.full: include the permutated and observed correlations in the retunred object?
#	method: either spearman,  pearson, mann-whitney, kruskal
#	logrates:  log-transform the rates before analysis? This can only matter for the pearson correlation
#	two.tailed: perform a two tailed statistical test? (in which case it will double the p-value)
# traitorder: for one tail test, specify the direction of correlation ("positive" or "negative") for countinues trait or a string 
#             indicating states of binary trait with increasing speciation rate, separted by comma (e.g., 'A, B'). Currently, this function only
#             perform two-tailed test for categorical data with more than two states.
# Returns:
#	estimate: the mean observed correlation coefficient
#	p.value: the probability that the observed correlation is less than or equal to
#				a sample from the null distribution. If you are doing a one-tailed
#				test on a continuous trait, a small pvalue means that your observed correlation is larger
#				than the null distribution. P-values approaching 1 mean that the observed
#				correlation is more negative than expected.
#	gen: if return.full=TRUE, the vector of generations sampled from the bammdata objected for permutation
#	obs.corr: if return.full=TRUE, a vector of observed correlation coefficients between traits and tip speciation rate for every sampled generation
#	null: if return.full=TRUE, a vector of permuted correlation coefficients for everty sample generation

# ephy<-ed
# traits<-bitrait
# reps<-100
# return.full=T
# method="m"
# logrates=T
# two.tailed=T
# nthreads=6
# traitorder='0,1';

##' @title STRAPP: STructured Rate Permutations on Phylogenies
##'
##' @description Given a \code{bammdata} object and a vector of (continuous)
##'     trait data, assess whether the correlation between the trait and bamm
##'     estimated speciation, extinction or net diversification rate is
##'     significant using permutation. A set of posterior samples is randomly
##'     drawn from the \code{bammdata} object. If the trait is continuous,
##'     this function calculates the correlation coefficients between the
##'     trait and tip rates (observed correlation), as well as that with
##'     permuted rates for each posterior sample. In a one-tailed test for
##'     positive correlations, the reported p-value is the proportion of the
##'     posterior samples in which the observed correlation is larger than the
##'     correlations calculated with permuted rates. In a two-tailed test, the
##'     reported p-value is the proportion of the posterior samples in which
##'     the null correlation is as extreme as the correlation observed. If the
##'     trait is binary, the U statistic of the Mann-Whitney test is
##'     calculated instead of correlation coefficients to assess whether there
##'     is a significant difference in rate between the two trait states. For
##'     categorical traits with more than two states, the Kruskal-Wallis rank
##'     sum statistic is used.
##'
##' @param ephy An object of class \code{bammdata}.
##' @param traits A vector of trait data, with names corresponding to tips in
##'     the \code{bammdata} object. It can be numeric or categorical. 
##' @param reps An integer specifying the number of permutations (i.e., the
##'     number of posterior samples to randomly draw with replacement from the
##'     \code{bammdata} object).
##' @param rate A character string specifying which estimated rate from the
##'     \code{bammdata} object to use for testing correlation, must be one of
##'     'speciation', 'extinction', 'net diversification' or 'trait'. Defaults to
##'     'speciation'. You can specify just the initial letter. Ignored for 
##' 	trait event data.
##' @param return.full A logical. If \code{TRUE}, the list of posterior
##'     samples, the observed correlation for each posterior sample, and the
##'     null distribution will be included in the returned object. Defaults to
##'     \code{FALSE}.
##' @param method A character string, must be one of 'spearman', 'pearson',
##'     'mann-whitney', or 'kruskal'. Defaults to 'spearman'. You can specify
##'     just the initial letter.
##' @param logrates A logical. If \code{TRUE} log-transform the rates before
##'     analysis. Defaults to \code{TRUE}. This can only matter for the
##'     pearson correlation. 
##' @param two.tailed A logical, used for continuous trait data. If
##'     \code{TRUE}, perform a two-tailed statistical test (i.e., if the null
##'     distribution is symmetric, it is equivalent to doubling the p-value).
##'     Defaults to \code{TRUE}.  
##' @param traitorder A character string specifying the direction of
##'     correlation for the alternative hypothesis. For continuous traits, it
##'     must be either "positive" or "negative"; only the initial letter is
##'     needed. For binary traits, it must be a string indicating states with
##'     increasing rate under the alternative hypothesis, separated by comma
##'     (e.g., 'A, B'). One-tailed test for categorical data with more than
##'     two states is not supported.
##' @param nthreads Number of threads to use for parallelization of the
##'     function. The R package \code{parallel} must be loaded for
##'     \code{nthreads > 1}.
##'
##' @details Tip rates --trait, speciation, extinction, or net diversification
##'     rates-- are permuted in a way such that pairwise covariances in rates
##'     between species are maintained. That is, tips with the same
##'     \code{tipStates} still have the same rate after permutation. Posterior
##'     samples are randomly selected with replacement from the
##'     \code{bammdata} object, so \code{reps} could be smaller or larger than
##'     the total number of samples in the object.
##'
##'     This function expects that the bamm-data object and the trait data
##'     have the same taxon set. It may be necessary to subset the trait data
##'     and/or run \code{\link{subtreeBAMM}} on the \code{bamm-data} object in
##'     order to meet this requirement.
##'
##' @return A list with the following components:
##'     \itemize{
##'         \item estimate: A numeric value for continous trait data: the
##'             average observed correlation between tip rates and the trait
##'             across the posterior samples. For categorical traits, it is
##'             a list showing the median species-specific rates for each
##'             trait state.
##'         \item p.value: A numeric value. The probability that the observed
##'             correlation is less than or equal to a sample from the null
##'             distribution.
##'         \item method: A character string, as input.
##'         \item rate: A character string, as input.
##'         \item two.tailed: A logical, as input.
##'         \item gen: An integer vector, recording which posterior samples
##'             were selected. Only present when \code{return.full} is
##'             \code{TRUE}.
##'         \item obs.corr: A numeric vector, the observed correlation
##'             coefficents for each posterior sample. Only present when
##'             \code{return.full} is \code{TRUE}. For binary traits, centered
##'             U statistics (U - n1* n2/2; where n1 and n2 are the number of
##'             species in each state of the binary trait) is reported.
##'         \item null: A numeric vector. The null distribution of
##'             correlation coefficients (or centered U statistics for binary
##'             traits) from permutation. Only present when \code{return.full}
##'             is \code{TRUE}.
##'     }
##'
##' @author Dan Rabosky, Huateng Huang
##'
##' @seealso \code{\link{subtreeBAMM}}
##'
##' @references \url{http://bamm-project.org/}
##'
##'     Rabosky, D. L. and Huang, H., 2015. A Robust Semi-Parametric Test for
##'     Detecting Trait-Dependent Diversification. Systematic Biology 65:
##'     181-193.
##'
##'     Rabosky, D. L. 2014. Automatic detection of key innovations, rate
##'     shifts, and diversity-dependence on phylogenetic trees. PLoS ONE
##'     9:e89543.
##'
##'     Rabosky, D. L., F. Santini, J. T. Eastman, S. A. Smith, B. L.
##'     Sidlauskas, J. Chang, and M. E. Alfaro. 2013. Rates of speciation and
##'     morphological evolution are correlated across the largest vertebrate
##'     radiation. Nature Communications DOI: 10.1038/ncomms2958.
##'
##' @examples
##' # using a small subset of the fish data set (300 taxa) in Rabosky et al.
##' # 2013. Nat. Com. paper
##' data(fishes, events.fishes)
##' xx <- getEventData(phy = fishes, eventdata = events.fishes, 
##'       nsamples = 500, type = "diversification")
##' # traits.fishes is the trait -- body size
##' data(traits.fishes)
##' x <- traitDependentBAMM(ephy = xx, traits = traits.fishes, reps = 1000,
##'                         return.full = TRUE, method = 's', logrates = TRUE,
##'                         two.tailed = TRUE)
##' @aliases strapp
##' @keywords nonparametric
##' @export
traitDependentBAMM <- function(ephy, traits, reps, rate = 'speciation', return.full = FALSE, method = 'spearman', logrates = TRUE, two.tailed = TRUE, traitorder = NA, nthreads = 1) {
   
	if (nthreads > 1) {
		if (!"package:parallel" %in% search()) {
			stop("Please load package 'parallel' for using the multi-thread option\n");
		}
	}
	
	if (ephy$type == 'trait') {
		rate <- 'trait'
	}
	
	ratetype.option <- c("speciation", "extinction", "net diversification", "trait");
	ratetype <- grep(paste0("^", rate), ratetype.option, value = TRUE, ignore.case = TRUE);
	  
  if (length(ratetype) == 0) {
    stop("Rate must be one of 'speciation', 'extinction', or 'net diversification', only the initial letter is needed.\n")
  }
  
	if (ratetype == "net diversification" & logrates == TRUE) {
		cat("WARNING: Net diversification might be negative and logged rates would then produce NaNs.\n");
	}

	if (ratetype == 'trait' & ephy$type == 'diversification') {
		stop('Rate must be either speciation or net diversification if ephy is a diversification analysis.\n');
	}

	if (ratetype %in% c('speciation', 'net diversification') & ephy$type == 'trait') {
		stop('Rate must be trait if ephy is a trait analysis.\n');
	}
	
	# check if species in ephy and traits match
	if (!identical(sort(names(traits)), sort(ephy$tip.label))) {
		stop('Species names in the bamm-data object and trait data are not identical. You may want to run subtreeBAMM and subset the trait data in order to reduce the two datasets to a common taxon set.')
	}

	method.option <- c("spearman",  "pearson", "mann-whitney", "kruskal");
	method <- grep(paste0("^", method), method.option, ignore.case = TRUE, value = TRUE);
	if (length(method) == 0) {
		stop("method must be one of 'spearman', 'pearson', 'mann-whitney', or 'kruskal', only the initial letter is needed");
	}

	# check if the trait is right class

	if (method == 'spearman' | method == "pearson") {
		if (! is.numeric(traits)){
			cat(paste0("selected ", method, ", but the trait is not numeric, converted the trait into a numeric vector\n"));
			traits <- as.numeric(traits);
		}
	} else if (method == "mann-whitney"| method == "kruskal") {
      
		if (length(unique(traits[! is.na(traits)])) == 1) {
			stop(paste("selected ", method, ", but the trait only has one level\n", sep = ''));
		}
		if (method == "mann-whitney") {
			if (length(unique(traits[! is.na(traits)])) > 2) {
				stop(paste("selected ", method, ", but the trait has more than two levels\n", sep = ''));
			}
		}
	}

	#check if the traitorder is specified
	trait.state <- NA;
	if (!two.tailed) {
		if (anyNA(traitorder)) {
			stop("selected one-tail test, but traitorder is not specified\n");
		}
		if ( method == "kruskal") {
			stop(" currently one-tail test is only available for continous or binary trait");
		}
		if (method == 'spearman' | method == "pearson") {
			direction.option <- c("positive", "negative");
			direction <- grep(paste0("^", traitorder), direction.option, ignore.case = TRUE, value = TRUE);
			if (length(direction) == 0) {
				stop(" for one-tail test with continous trait, traitorder must be either 'positive' or 'negative', only the initial letter is needed");
			} else {
				cat(paste0("select one-tailed ", method, " test\nAlternative hypothesis: the trait is ", direction, "ly correlated with speciation rate\n"));
			}
		} else {
			traitorder <- gsub(" ", "", traitorder);
			trait.state <- as.character(unlist(strsplit(x = traitorder, split = ",")));
			if (length(trait.state) != 2) {
				stop("please specify the traitorder for binary trait:\nTwo states separated by comma, and the state that is expected to have lower speciation rate first\n");
			} else {
				cat(paste0("selected one-tail ", method, " test\nAlternative hypothesis: species with trait ", trait.state[2], " has higher speciation rate than those with trait ", trait.state[1], "\n"));
			}
			for (i in trait.state) {
				if (sum(traits == i) == 0) {
					stop(paste("no species with state ", i," \n", sep = ''));
				}
			}
		}
	}
	if (ratetype == 'speciation') {
		tiprates <- ephy$tipLambda;
	} else if (ratetype == "extinction") {
		tiprates <- ephy$tipMu;
	} else if (ratetype == 'net diversification') {
		tiprates <- lapply(1:length(ephy$tipLambda), function(i) ephy$tipLambda[[i]] - ephy$tipMu[[i]]);
	} else if (ratetype == 'trait') {
		tiprates <- ephy$tipLambda;
	}
	tipstates <- ephy$tipStates;
	#tiprates <- tiprates[ephy$tip.label];
	traits <- traits[ephy$tip.label];
	stat.mu <- 0;
	if (method == "mann-whitney") {
		trait.stat.count <- table(traits);
		trait.stat.count <- trait.stat.count[! is.na(names(trait.stat.count))];
		stat.mu <- prod(trait.stat.count) / 2;
	}

	if (logrates) {
		tiprates <- lapply(1:length(tiprates), function(x) log(tiprates[[x]]));
	}

	#randomly sample generations from BAMM posterior
	gen <- sample(1:length(tiprates), size = reps, replace = TRUE);

	gen.tiprates <- list();
	for (l in 1:length(gen)) {
		gen.tiprates[[l]] <- data.frame(rates = tiprates[[gen[l]]], states = tipstates[[gen[l]]], stringsAsFactors = FALSE);
	}
  
	rm("tiprates", "tipstates");
	permute_tiprates <- function(m) {
		tt <- m$states;
		tlam <- m$rates;
		index <- unique(tt);
		lvec <- numeric(length(index));
		for (k in 1:length(index)) {
			lvec[k] <- tlam[tt == index[k]][1];
		}
		new_index <- sample(index, size = length(index), replace = FALSE);
		x <- rep(0,length(tt));
		for (xx in 1:length(index)) {
			x[which(tt == index[xx])] <- lvec[which(index == new_index[xx])];
		}
		x;   
	}
	if (nthreads > 1) {
		cl <- parallel::makePSOCKcluster(nthreads);
		p.gen.tiprates <- parallel::parLapply(cl, gen.tiprates, permute_tiprates);
		parallel::stopCluster(cl);
	} else {
		p.gen.tiprates <- lapply(gen.tiprates, permute_tiprates);
	}
	xgen.tiprates <- list();
	for (l in 1:length(gen)) {
		xgen.tiprates[[l]] <- gen.tiprates[[l]]$rates;
	}
	gen.tiprates <- xgen.tiprates; rm("xgen.tiprates");

	cortest <- function(rates, traits, method) {
	    if (sd(rates, na.rm = TRUE) == 0) {
	    	return(0);
	    } else {
			return(cor.test(rates, traits, method = method, exact = FALSE)$estimate);  
	    }
	}
	manntest <- function(rates, traits, two.tailed, trait.state) {
		if (two.tailed) {
			return(wilcox.test(rates ~ traits, exact = FALSE)$statistic);
		} else {
			return(wilcox.test(rates[which(traits == trait.state[2])], rates[which(traits == trait.state[1])], exact = FALSE)$statistic);
		}
	}

	kruskaltest <- function(rates, traits) {
		testres <- kruskal.test(rates ~ traits);
		# If there is no variation in rates (chi-squared value is NaN), then return a chi-squared value that has a P-value of 0.999, using the appropriate degrees of freedom
		if (is.na(testres$statistic)) {
			return(qchisq(p = 0.999, df = testres$parameter));
		} else {
			return(testres$statistic);
		}
	}
 

	if (nthreads > 1) {
		cl <- parallel::makePSOCKcluster(nthreads);
		if (method == 'spearman' | method == "pearson") {
			obs <- parallel::parLapply(cl, gen.tiprates,cortest, traits, method);
			permu <- parallel::parLapply(cl, p.gen.tiprates, cortest, traits, method);
		} else if (method == "mann-whitney") {
			obs <- parallel::parLapply(cl, gen.tiprates, manntest, traits, two.tailed, trait.state);
			permu <- parallel::parLapply(cl, p.gen.tiprates, manntest, traits, two.tailed, trait.state);
		} else {
			obs <- parallel::parLapply(cl, gen.tiprates, kruskaltest, traits);
			permu <- parallel::parLapply(cl, p.gen.tiprates, kruskaltest, traits);
		}
		parallel::stopCluster(cl);
	} else {
		if (method == 'spearman' | method == "pearson") {
			obs <- lapply(gen.tiprates, cortest, traits, method);
			permu <- lapply(p.gen.tiprates, cortest, traits, method);
		} else if (method == "mann-whitney") {      
			obs <- lapply(gen.tiprates, manntest, traits, two.tailed, trait.state);
			permu <- lapply(p.gen.tiprates, manntest, traits, two.tailed, trait.state);
		} else {    
			obs <- lapply(gen.tiprates, kruskaltest, traits);
			permu <- lapply(p.gen.tiprates, kruskaltest, traits);    
		}
	}
	obs <- unlist(obs);
	permu <- unlist(permu);
	obs <- obs - stat.mu;
	permu <- permu - stat.mu;

  

	if (two.tailed) {
		pval <- sum(abs(obs) <= abs(permu)) / length(permu);
	} else {
		if (method == "spearman" | method == "pearson") {
			if (direction == 'positive') {
				pval <- sum(obs <= permu) / length(permu);
			} else {
				pval <- sum(obs >= permu) / length(permu);
			}
		} else {
			pval <- sum(obs <= permu) / length(permu);
		}
	}
	if (method == "spearman" | method == "pearson") {
		obj <- list(estimate = mean(as.numeric(obs)), p.value = pval, method = method, two.tailed = two.tailed);
	} else {
		if (ratetype == 'speciation') {
			ave.tiprate <- getTipRates(ephy)$lambda.avg;
		} else if (ratetype == 'extinction') {
			ave.tiprate <- getTipRates(ephy)$mu.avg;
		} else if (ratetype == 'net diversification') {
			ave.tiprate <- getTipRates(ephy)$lambda.avg - getTipRates(ephy)$mu.avg;
		} else if (ratetype == 'trait') {
			ave.tiprate <- getTipRates(ephy)$beta.avg;
		}
		l <- lapply(unique(traits[!is.na(traits)]), function(x) {
			median(ave.tiprate[which(traits == x)], na.rm = TRUE);
		});
		names(l) <- as.character(unique(traits[! is.na(traits)]));
		obj <- list(estimate = l, p.value = pval, method = method, two.tailed = two.tailed);
	}
	obj$rate <- ratetype;
	if (return.full) {
		obj$obs.corr <- as.numeric(obs);
		obj$gen <- gen;
		obj$null <- as.numeric(permu);
	}

	return(obj);
}
Any scripts or data that you put into this service are public.
BAMMtools documentation built on Sept. 13, 2024, 1:10 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
BAMMtools
Analysis and Visualization of Macroevolutionary Dynamics on Phylogenetic Trees

R/traitDependentBAMM.R
In BAMMtools: Analysis and Visualization of Macroevolutionary Dynamics on Phylogenetic Trees

Defines functions traitDependentBAMM

Documented in traitDependentBAMM

Try the BAMMtools package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

BAMMtools Analysis and Visualization of Macroevolutionary Dynamics on Phylogenetic Trees

R/traitDependentBAMM.R In BAMMtools: Analysis and Visualization of Macroevolutionary Dynamics on Phylogenetic Trees

Defines functions traitDependentBAMM

Documented in traitDependentBAMM

Try the BAMMtools package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

BAMMtools
Analysis and Visualization of Macroevolutionary Dynamics on Phylogenetic Trees

R/traitDependentBAMM.R
In BAMMtools: Analysis and Visualization of Macroevolutionary Dynamics on Phylogenetic Trees