In Dmirandae/blepd: Branch Length Evaluation in Phylogenetic Diversity Analysis

The Algorithms

Constructing the functions

First, we load the required libraries:

## cleaning
rm(list = ls())

## libraries

## installing and loading the package

##install.packages("../../blepd_0.1.1.tar.gz", repos = NULL, type="source")

library(blepd)

packageVersion("blepd")

## to plot trees 

library(ggtree)

library(gridExtra)

library(RColorBrewer)

Now, we load the data: tree and distributions

## A binary tree with four terminals, newick format. 


data(tree)

data(distribution)

initialTree <- tree

str(initialTree)

## distributions for four taxa and two area, table format. 

dist4taxa <- distribution

Plotting:

## A. the tree

plotTree <-  ggtree(initialTree, ladderize=TRUE,
                    color="black", size=1 , linetype="dotted") +  
             geom_tiplab(size=6, color="black") +
             theme_tree2() +  
             labs(title = "A. Four terminals, equal branch length")

print(plotTree)

## B. the distribution

## distribution to XY

distXY <- matrix2XY(dist4taxa)


plotDistrib <- ggplot(data=distXY, 
                     aes(x= Terminal, y= Area), 
                     size =11) +
               geom_point() +
               labs(title = "B. Terminals and Distributions",
                    y = "Area",
                    x = "Terminal")

plotDistrib

Now we check whether names in both objects, trees and distributions are the same:

all(colnames(dist4taxa) == initialTree$tip.label)

We report all branch lengths and calculate the PD values.

initialTree$edge.length

initialPD <- PDindex(tree=initialTree, distribution = dist4taxa)

initialPD

Single taxon evaluation function

To test the effect of changing a single terminal branch length, we will:

create a copy of the initial branch length.
calculate the initial PD and get the area(s) with the max value.
change the length of a given terminal: t1.
recalculate PD and get the area(s) with the max value.
compare both results (from steps 2 to 4) to evaluate the effect of the perturbation.

bestInitialArea <- row.names(dist4taxa)[which(initialPD == max(initialPD))]

bestInitialArea

tipToEval <- "t1"

value <- 1.1

numberTipToEval <- which(initialTree$tip.label %in% tipToEval) 

newTree <- initialTree

## a table, binding tree$edge and tree$edge.length

createTable <- function(tree = tree){allDataTable <- tree$edge
                                      allDataTable <- cbind (allDataTable, tree$edge.length)
                                     return(allDataTable)
                  }


## new tree, with t1 branch changed to "value" 

newTree$edge.length[which(createTable(initialTree)[,2] %in% numberTipToEval)] <- value

newTree$edge.length

## plotting

plotNewTree <- ggtree(newTree) + 
               theme_tree2() +  
               geom_tiplab(size=6, color="black") +
               labs(title = "C. Four terminals, non equal branch length")

print(plotNewTree)

### PD for the modified tree

modifiedPD <- PDindex(tree = newTree, distribution = dist4taxa)

bestModifiedArea <- row.names(dist4taxa)[which(modifiedPD == max(modifiedPD))]

## comparing initial and modified areas selected

bestInitialArea

bestModifiedArea

bestInitialArea == bestModifiedArea

As shown, a modification in the branch length will impact the area selected, we can consider that the results are very sensible to changes in branch length.

THE function

To evaluate the behaviour of a single terminal branch length we must:

modify a branch length.
recalculate the PD value.
compare the effect of this perturbation.

We can turn these steps into a function called evalTerminal, with four parameters:

a tree with branch lengths
the distribution object
the terminal to evaluate
the approach to evaluate the terminal, that could be "lower", or "upper" limits, to evaluate the minimal or the maximal value of the branch length where the PD value changes, therefore another area is selected.

evalTerminal(tree = initialTree,
             distribution = dist4taxa,
             tipToEval = "t1",
             approach = "lower" )

The lower limit when we change the branch length for terminal t1 is 0.99, as any change in branch length will modify the area selected from A1A2 to A2, as the tie between the paths terminals t1/t3 (area A1) vs t2/t4 (area A2) will be solved in favor of t2/t4 when A1 is shorter.

evalTerminal(tree = initialTree,
             distribution = dist4taxa,
             tipToEval = "t2",
             approach = "lower" )

A similar result will arrive from changing the terminal branch t2, but in this case the tie is solved to favour A1.

evalTerminal(tree = initialTree,
             distribution = dist4taxa,
             tipToEval = "t1",
             approach = "upper" )

And, we will get the same result by changing the branch length of the terminal t1 from 1 to 1.x, to find the upper limit. The tie is solved in favour of area A1, opposite to the solution when we found the lower limit. In this case, even the smaller change in any terminal branch value, will modify the results.

We test the effect of the branch length for all terminals.

newTree 

modifiedPD 

bestModifiedArea

evalTerminal(tree = newTree, distribution = dist4taxa, tipToEval = "t3", approach = "upper" )

evalTerminal(tree = newTree, distribution = dist4taxa, tipToEval = "t3", approach = "lower" )

The THING is the PD difference between areas, and whether this value could be accumulated in a single terminal branch, or if the contribution to the PD value is evenly distributed among all terminal branches, and therefore to change the PD value more than one terminal branch length must have to change, to select another area.

The function to test all terminals at the same time is evalTree, with two parameters: the tree and the distribution. The function returns a data.frame object with 14 fields: labelTerminal, lowerBranchLength, InitialArea, lowerFinalArea, initialLength, upperBranchLength, upperFinalArea, changeLower, changeUpper, deltaUpper, deltaLower, deltaPD, areaDelta, and abDelta.

evalTree(tree = initialTree,
         distribution = dist4taxa)

The extreme sensitivity of the PD results to the terminal branch length is seen in the column absolute length difference (=abDelta), as any length change -larger than 0-, will change the area selected.

Null model tests

We could also swap branch lengths and evaluate the behaviour before and after swapping. This swapping could be done by swapping two terminal branch lengths simpleswap, all terminal branch lengths allswap or replacing the terminal branch length with randomly distributed values, using an uniform distribution -uniform-, where the min and max values are obtained from the actual branch lengths.

Dmirandae/blepd documentation built on Jan. 17, 2025, 3:08 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com