R/MultiTreeContradiction.R

Defines functions MultiTreeContradiction

Documented in MultiTreeContradiction

#' Multiple tree contradiction distances
#'
#' @description
#'
#' Given a set of input trees calculates all pairwise tree contradictions and returns a matrix.
#'
#' @param trees An object of class 'multi.phylo'.
#' @param rescale Logical indicating whether (TRUE; default) or not (FALSE) to rescale contradictions (i.e., place on a zero to one scale).
#'
#' @details
#'
#' The contradiction metric was introduced by Bapst et al. (2018) as effectively an extension of the Robinson-Fould distance (Robinson and Foulds 1981) that accounts for non-bifurcating trees. More specifically, it assumes that any polytomies in those trees are effectively "soft", representing uncertainty, rather than "hard", representing a real hypothesis of relationships (effectively what Robinson-Foulds does).
#'
#' An explanation of how the metric works is best understood in the context of Robinson-Foulds (RF), which works by comparing two trees, summing the clades (bipartitions) unique to the first tree with those unique to the second tree. (Trees must contain the same labelled tips.) Thus identical trees would score a value of zero, and maximally dissimilar trees a value of 2(N - 2), where N is the number of tips and both trees are fully bifurcating. Note that notionally RF is for unrooted trees, but I find it makes more sense to think of it as not counting the root as this will always be identical across two trees (representing the clade containing all tips). This is why the maximum is N - 2 and not N - 1 (the latter being the number of all internal nodes, i.e., clades).
#'
#' Extending this concept to multifurcating trees imagine we wish to compare a star phylogeny with any fully bifurcating tree, we would get an RF of N - 2 (the number of clades in the bifurcating tree, excluding the root). This might make sense for many purposes as it considers the trees to be distinct, e.g., some distacne apart in tree-space. However, for some purposes we might really care whether or not the two trees \emph{contradict} each other. For example, knowing nothing else about the data, might a bifurcating tree potentially belong to the the sample of trees used to produce a multifurcating consensus tree? If this is what we want to know then we are searching for trees that have a contradiction of zero. In the case of a star phylogeny this would trivially be any other tree (as any set of relationships cannot logically conflict with no relationships). It is this measure that the contradiction metric attempts to capture.
#'
#' In practice this works by asking how many clades (bipartitions) explicitly contradict across two trees. This can also be rescaled (zero to one) by dividing by the total number of possible contradictions (N - 2).
#'
#' Note that a function to calculate contradiction has already been introduced by Dave Bapst in his \code{paleotree} package and users should also consult that function for a description.
#'
#' The intended purpose of this function is simply a faster implementation for users who wish to compute many tree contradiction differences simultaneously.
#'
#' @return
#'
#' A pairwise contradiction matrix, where rows and columns represent the input trees in the same order.
#'
#' @author
#'
#' Graeme T. Lloyd \email{graemetlloyd@@gmail.com}
#'
#' @references
#'
#' Bapst, D. W., H. A. Schreiber, and S. J. Carlson. 2018. Combined Analysis of Extant Rhynchonellida (Brachiopoda) using Morphological and Molecular Data. \emph{Systematic Biology}, \bold{67}, 32-48.
#'
#' Robinson, D. F. and Foulds, L. R., 1981. Comparison of phylogenetic trees. \emph{Mathematical Biosciences}, \bold{53}, 131-147.
#'
#' @seealso
#'
#' \code{paleotree::treeContradiction}
#'
#' @examples
#'
#' # Generate three example trees (with tips labelled A-D):
#' ExampleTrees <- read.tree(text = c("(A,B,C,D);", "(A,(B,(C,D)));",
#'   "((A,B),C,D);"))
#'
#' # View trees, to show multifurcations:
#' par(mfrow = c(1, 3))
#' x <- lapply(ExampleTrees, plot, cex = 2)
#'
#' # Calculate rescaled tree contradiction differences matrix:
#' MultiTreeContradiction(ExampleTrees)
#'
#' # Calculate raw tree contradiction differences matrix:
#' MultiTreeContradiction(ExampleTrees, rescale = FALSE)
#'
#' @export MultiTreeContradiction
MultiTreeContradiction <- function(trees, rescale = TRUE) {
  
  # Test contradiction subfunction (from Dave Bapst):
  testContradiction <- function(namesA, namesB) {
    
    #
    matchA <- namesA %in% namesB
    
    #
    matchB <- namesB %in% namesA
    
    #
    if(any(matchB)) {
      
      #
      Output <- !(all(matchA) | all(matchB))
      
    #
    } else {
      
      #
      Output <- FALSE
      
    }
    
    #
    return(Output)
    
  }
  
  # Number of contradictions subfunction (from Dave Bapst):
  nContradiction <- function(partA, partB) {
    
    #
    partContra <- sapply(partA, function(x) any(sapply(partB, function(y) testContradiction(x, y))))
    
    #
    Output <- sum(partContra)
    
    #
    return(Output)
    
  }
  
  # Get partitions of tree as list (Liam Revell's function):
  PropPartAsList <- function(pp) lapply(pp, function(x, pp) sort(attr(pp, "labels")[x]), pp = pp)
  
  # Get tree contradiction (Dave Bapst's function):
  TreeContradiction <- function(part1, part2) {
    
    # Get number of contradictions comparing parts 1 and 2:
    nContra1 <- nContradiction(part1, part2)
    
    # Get number of contradictions comparing parts 2 and 1:
    nContra2 <- nContradiction(part2, part1)
    
    # Sum values and store as output:
    Output <- nContra1 + nContra2
    
    # Return output:
    return(Output)
    
  }
  
  # Check input format is multiPhylo:
  if(!inherits(trees, "multiPhylo")) stop("trees are not of class multiPhylo")
  
  # Check all trees have same number of tips:
  if(length(unique(unlist(lapply(trees, Ntip)))) != 1) stop("Trees have different numbers of tips")
  
  # Check all trees have same tip labels:
  if(length(unique(unlist(lapply(lapply(lapply(trees, '[[', "tip.label"), sort), paste, collapse = "%%")))) != 1) stop("Trees do not all have same complement of tip labels")
  
  # Get total number of trees:
  NTrees <- length(trees)
  
  # Create all zero distance matrix (will fill later):
  OutputMatrix <- matrix(data = 0, nrow = NTrees, ncol = NTrees)
  
  # Create all partitions list:
  AllPartitionsList <- lapply(unclass(trees), function(x) PropPartAsList(prop.part(x)))
  
  # For the ith tree:
  for(i in 1:(NTrees - 1)) {
    
    # For the jth tree:
    for(j in (i + 1):NTrees) {
      
      # Get tree contradiction (sensu Bapst) and store:
      OutputMatrix[i, j] <- OutputMatrix[j, i] <- TreeContradiction(part1 = AllPartitionsList[[i]], part2 = AllPartitionsList[[j]])
      
    }
    
  }
  
  # If rescaling output:
  if(rescale) {
    
    # Number of possible nodes that could contradict on one unrooted tree:
    nPossNodes <- Ntip(trees[[1]]) - 2
    
    # Rescale matrix (0-1 range):
    OutputMatrix <- OutputMatrix / (2 * nPossNodes)
    
  }
  
  # Return distance matrix:
  return(OutputMatrix)
  
}
graemetlloyd/metatree documentation built on April 29, 2021, 2:32 a.m.