In TGuillerme/dispRity: Measuring Disparity

tree hackathon!

Implementation idea

The idea would be that users do chrono.subsets with all their desired options to generate the time bins (this is already implemented and solid), and then they could use the function get.tree to extract the trees from the resulting dispRity objects. In the case of the time bins, it would return a list of length number of time bins with each list element containing a multiPhylo object that itself contains at least one tree (the one in the time bin).

For example, if we have this tree:

## Testing detect edges and get new tree
set.seed(1)
simple_tree <- rtree(5)
simple_tree$edge.length <- rep(1, Nedge(simple_tree))
simple_tree$root.time <- 4
simple_tree$node.label <- letters[1:4]
plot(simple_tree, show.tip.label = FALSE); axisPhylo()
nodelabels()
nodelabels(simple_tree$node.label, adj = -1, col = "blue")
edgelabels()
tiplabels()
tiplabels(simple_tree$tip.label, adj = -1, col = "blue")
abline(v = c(0, 1, 2, 3, 3.8), col = "grey", lty = 2)

We can have the pipeline looking something like:

## Some data (to be ignored)
ignore_data <- matrix(1, ncol = 1, nrow = 9, dimnames = list(c(simple_tree$tip.label, simple_tree$node.label)))

## Using chrono.subsets to do the time bins
my_bins <- chrono.subsets(data = ignore_data, tree = simple_tree, method = "discrete", time = c(0, 1, 2, 3, 3.8))

## Get the trees
my_subtrees <- get.tree(my_bins)

You can then measure branch length for each tree using something like:

my_branch_lengths_per_bin <- lapply(my_subtrees, lapply, function(x) sum(x$edge.length))

Using chrono.subsets and get.tree allows to generalise the method for any type of time bins (e.g. with tips or nodes spanning across time - FADLAD), and any number of trees (e.g. using a tree distribution rather than a single tree).

Output format needed

For all that to work smoothly we want my_subtrees in this example to be something like:

my_subtrees
   |
   \---<first_bin_name>
   |    |
   |    \--<"multiPhylo"> = list of trees in the first bin for the first tree
   |                        (if only one subtree is in that bin, it's a multiPhylo of length 1)
   |
   \---<second_bin_name>
   |    |
   |    \--<"multiPhylo"> = list of trees in the first bin for the first tree
   |
   \--- etc.

Easy!

What does `chrono.subsets` outputs

(this generalises to custom.subsets by the way - but ignore for this part of the implementation).

object
    |
    \---$matrix* = class:"list" (a list containing the orginal matrix/matrices)
    |   |
    |   \---[[1]]* = class:"matrix" (the matrix (trait-space))
    |   |
    |   \---[[...]] = class:"matrix" (any additional matrices)
    |
    |
    \---$tree* = class:"multiPhylo" (a list containing the attached tree(s) or NULL)
    |   |
    |   \---[[1]] = class:"phylo" (the first tree)
    |   |
    |   \---[[...]] = class:"phylo" (any additional trees)  
    |
    |
    \---$call* = class:"list" (details of the methods used)
    |   |
    |   \---$subsets = class:"character": some info about the subset type (discrete, continuous, custom)
    |   |
    |   \---more stuff. IGNORE
    |
    \---$subsets* = class:"list" (subsets as a list)
    |   |
    |   \---[[1]]* = class:"list" (first item in subsets list)
    |   |   |
    |   |   \---$elements* = class:"matrix" (one column matrix containing the elements within the first subset)
    |   |   |
    |   |   \---[[2]] = IGNORE (for bootstrap and rarefaction)
    |   |
    |   \---[[2]] = class:"list" (second item in subsets list)
    |   |   |
    |   |   \---$elements* = class:"matrix" (one column matrix containing the elements within the second subset)
    |   |   |
    |   |   \---[[2]] = IGNORE (for bootstrap and rarefaction)
    |   |
    |   \---[[...]] = class:"list" (the following subsets)
    |       |
    |       \---$elements* = class:"matrix" (a one column matrix containing the elements within this subset)
    |       |
    |       \---[[...]] = IGNORE (for bootstrap and rarefaction)
    |
    \--- more stuff. IGNORE

Basically most info about the subsets generated by chrono.subsets is either in data$call$subsets for the meta data and in data$subsets for the actually data. The names of the subsets can be extracted using name.subsets(). What is in each subset is in the $elements part, it's a matrix of usually one column with the ID of the elements.

For example:

data$subsets[[1]]
    $elements
         [,1]
    [1,]    1
    [2,]    2
    [3,]    3

Here the integers 1:3 in the column refer to the row numbers in the data. So you can access the sub-matrix using data$matrix[[1]][data$subsets[[1]]$elements[, 1]] or it's rownames using rownames(data$matrix[[1]][data$subsets[[1]]$elements[, 1]]). But no need to worry here. This is handled by some other bits.

Using these elements to get the sub trees

The idea is to use these elements though to access the parts of the subtree and generate it as an output. So for example, if elements 1 to 3 are the tips t4 and t3 and the node d in the simple tree above, it should output the sub tree (d(t4,t3)).

I've implementing this in the following files:

testing: dispRity/test/testthat/test-dispRity.utilities.R l.725 ("get.tree with subsets")
code: dispRity/R/dispRity.utilities.R: l.539 get.tree.R
code: dispRity/R/dispRity.utilities_fun.R: l.137 detect.edges, get.new.tree, get.one.tree.subset.

The best way to load is to go to the root of the repo and use devtools::load_all() to get all the functions linked:

setwd("dispRity/")
devtools::load_all()

So far I have:

find the edges connecting the elements (detect.edges) which is used in 2
using get.new.tree to get the tree using the elements as inputs which is used in 3
using get.one.tree.subset as a function to lapply across all subsets in get.tree

The problem though is that:

it doesn't cut branch lengths
it doesn't take age into account
it doesn't really works

Working idea

Feed the list of elements (in our case element 9, 2)
Feed the boundary ages (1 - 0.2)
- Edit the age boundaries to that the old age becomes the root of the tree and the young age becomes the cutting point
Recreate the subtree spanning from the selected elements (descendant from them)
- Make sure that the elements names are kept
Using slice.tree to cut the upper boundary (young age)
Return the new subtree