knitr::opts_chunk$set(echo = TRUE)
treedata.table
PackageThe aim of the treedata.table
R package is to allow researchers to access and manipulate phylogenetic data using tools from the data.table
package. data.table
has many functions for rapidly manipulating data in a memory efficient way.
Using the treedata.table
package begins with creating a treedata.table
object. The treedata.table
matches the tip.labels of the phylogeny to a column of names in your data.frame
. This allows you to manipulate the data, and the corresponding tree together.
Importantly, the character matrix must must include a column with the taxa names and should be of class data.frame
. The tree must be of class phylo
or multiPhylo
.
A treedata.table
is created using the as.treedata.table
function. Here we use the
Anolis dataset from treeplyr
. Traits in this dataset were randomly generated for a set of 100 species.
library(ape) library(treedata.table) # Load example data data(anolis) #Create treedata.table object with as.treedata.table td <- as.treedata.table(tree = anolis$phy, data = anolis$dat)
We may inspect our object by calling it by name. You will notice that your data.frame
is now a data.table
. A data.table
is simply an advanced version of a data.frame
that, among other, increase speed in data manipulation steps while simplifying syntax.
td
Furthermore, the new data.table
has been reordered into the same order as the tip.labels of your tree.
td$phy$tip.label == td$dat$tip.label
Your data table can be indexed in the same way any other data.table
object would be. For example, if we wanted to look at our snout-vent length column, we can do that like so.
td$dat[,'SVL']
You can also use double bracket syntax to directly return column data as a named list.
td[["SVL"]]
The same functionality can also be accomplished through the extractVector
function. Both the double bracket syntax and the extractVector
function will return a named vector.
extractVector(td, 'SVL')
Multiple traits can also be extracted using extractVector
.
extractVector(td, 'SVL','ecomorph')
However, there's a couple aspects that are unique to [[.treedata.table()
and extractVector()
. First, [[.treedata.table()
has an extra exact argument to enable partial match (i.e. when target strings and those in the treedata.table
object match partially). Second, extractVector()
can extract multiple columns and accepts non-standard evaluation (i.e. names are treated as string literals).
The real power in treedata.table is in co-indexing the tree and table. For example, in the below command, we use data.table
syntax to take the first representative from each ecomorph. We retain all data columns. If you examine the tree object, you will see that it has had all the tips absent from the resultant data.table
.
td[, head(.SD, 1), by = "ecomorph"]
We could also do the same operation with multiple columns:
td[, head(.SD, 1), by = .(ecomorph, island)]
Tail is also implemented
td[, tail(.SD, 1), by = "ecomorph"]
Columns in the treedata.table
object can also be operated on using general data.table
syntax. In the below example, the tree is pruned to those tips that occur in Cuba. This is the data.table
equivalent of dplyr
's filter. Then, a new column is created in the data.table
, assigned the name "Index", and assigned the value of the SVL + the hostility index. This enables concurrent manipulation of the phylogeny, and the calculation of a new index for only those tips we would actually like to use.
td[island == "Cuba",.(Index=SVL+hostility)]
treedata.table
objectsIn the below command, we extract one vector from our data.table and use geiger
's continuous model fitting to estimate a Brownian motion model for the data using the tdt
function.
tdt(td, geiger::fitContinuous(phy, extractVector(td, 'SVL'), model="BM", ncores=1))
treedata.table
objectsWe can also drop tips directly from the tree, and have those tips drop concurrently from the data.table. In the example below, we remove two taxa by name.
dt <- droptreedata.table(tdObject=td, taxa=c("chamaeleonides" ,"eugenegrahami" ))
We can check if A. chamaeleonides and A. eugenegrahami are still in the tree
c("chamaeleonides" ,"eugenegrahami" ) %in% dt$phy$tip.label
And we can do the same with the data in the treedata.table
object
c("chamaeleonides" ,"eugenegrahami" ) %in% dt$dat$X
When you're done, the data.table and tree can both be extracted from the object:
df <- pulltreedata.table(td, "dat") tree <- pulltreedata.table(td, "phy")
The table
df
and the corresponding tree
tree
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.