knitr::opts_chunk$set(echo = TRUE)

Getting Started With The treedata.table Package

The aim of the treedata.table R package is to allow researchers to access and manipulate phylogenetic data using tools from the data.table package. data.table has many functions for rapidly manipulating data in a memory efficient way.

Using the treedata.table package begins with creating a treedata.table object. The treedata.table matches the tip.labels of the phylogeny to a column of names in your data.frame. This allows you to manipulate the data, and the corresponding tree together.

Importantly, the character matrix must must include a column with the taxa names and should be of class data.frame. The tree must be of class phylo or multiPhylo.

A treedata.table is created using the as.treedata.table function. Here we use the Anolis dataset from treeplyr. Traits in this dataset were randomly generated for a set of 100 species.

library(ape)
library(treedata.table)

# Load example data
data(anolis)
#Create treedata.table object with as.treedata.table
td <- as.treedata.table(tree = anolis$phy, data = anolis$dat)

We may inspect our object by calling it by name. You will notice that your data.frame is now a data.table. A data.table is simply an advanced version of a data.frame that, among other, increase speed in data manipulation steps while simplifying syntax.

td

Furthermore, the new data.table has been reordered into the same order as the tip.labels of your tree.

td$phy$tip.label == td$dat$tip.label

Manipulating Data

Coindexing

Your data table can be indexed in the same way any other data.table object would be. For example, if we wanted to look at our snout-vent length column, we can do that like so.

td$dat[,'SVL']

You can also use double bracket syntax to directly return column data as a named list.

td[["SVL"]]

The same functionality can also be accomplished through the extractVector function. Both the double bracket syntax and the extractVector function will return a named vector.

extractVector(td, 'SVL')

Multiple traits can also be extracted using extractVector.

extractVector(td, 'SVL','ecomorph')

However, there's a couple aspects that are unique to [[.treedata.table() and extractVector(). First, [[.treedata.table() has an extra exact argument to enable partial match (i.e. when target strings and those in the treedata.table object match partially). Second, extractVector() can extract multiple columns and accepts non-standard evaluation (i.e. names are treated as string literals).

The real power in treedata.table is in co-indexing the tree and table. For example, in the below command, we use data.table syntax to take the first representative from each ecomorph. We retain all data columns. If you examine the tree object, you will see that it has had all the tips absent from the resultant data.table.

 td[, head(.SD, 1), by = "ecomorph"]

We could also do the same operation with multiple columns:

td[, head(.SD, 1), by = .(ecomorph, island)]

Tail is also implemented

 td[, tail(.SD, 1), by = "ecomorph"]

Columns in the treedata.table object can also be operated on using general data.table syntax. In the below example, the tree is pruned to those tips that occur in Cuba. This is the data.table equivalent of dplyr's filter. Then, a new column is created in the data.table, assigned the name "Index", and assigned the value of the SVL + the hostility index. This enables concurrent manipulation of the phylogeny, and the calculation of a new index for only those tips we would actually like to use.

td[island == "Cuba",.(Index=SVL+hostility)]

Running functions on treedata.table objects

In the below command, we extract one vector from our data.table and use geiger's continuous model fitting to estimate a Brownian motion model for the data using the tdt function.

tdt(td, geiger::fitContinuous(phy, extractVector(td, 'SVL'), model="BM", ncores=1))

Dropping and extracting taxa from treedata.table objects

We can also drop tips directly from the tree, and have those tips drop concurrently from the data.table. In the example below, we remove two taxa by name.

dt <- droptreedata.table(tdObject=td, taxa=c("chamaeleonides" ,"eugenegrahami" ))

We can check if A. chamaeleonides and A. eugenegrahami are still in the tree

c("chamaeleonides" ,"eugenegrahami" ) %in% dt$phy$tip.label

And we can do the same with the data in the treedata.table object

c("chamaeleonides" ,"eugenegrahami" ) %in% dt$dat$X

When you're done, the data.table and tree can both be extracted from the object:

df <- pulltreedata.table(td, "dat")
tree <- pulltreedata.table(td, "phy")

The table

df

and the corresponding tree

tree


ropensci/treedata.table documentation built on Sept. 12, 2021, 6:23 p.m.