knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(kewr) library(dplyr) library(tidyr)
The Tree of Life is a database of specimens sequenced as part of Kew's efforts to build a comprehensive evolutionary tree of life for flowering plants.
This package accesses data from the Tree of Life Explorer, an output of the Plant and Fungal Trees of Life Project (PAFTOL). The data in the Tree of Life is generated by target sequence capture using the universal Angiosperm353 probe set.
The Tree of Life contains information about specimens that have been sequenced and genes recovered in the process. It lets you download sequence data for the specimens, as well as alignments and trees for the genes.
The Tree of Life Explorer lets users view the tree of life constructed from the current dataset of samples.
You can view it using kewr
by loading it in:
tree <- load_tol() tree
This reads it as a single string, so you need to use other packages to parse it and view it (e.g, ape).
The Tree of Life contains information about the specimens that have been sequenced to construct the tree. The long-term aim is to sample at least on species from every flowering plant genus. This means that, typically, there will be one specimen per species.
You can search this information using the search_tol
function. There is no filtering or keyword-search functionality, so queries are just the name of an order/family/genus/species. For example, to get all specimens for the genus Myrcia:
specimens <- search_tol("Myrcia") specimens
This searching works by exact matching, and the taxonomy follows WCVP so only accepted names will work. For example, if we mispell Myrcia we get nothing:
search_tol("Mercya")
And if we search for an outdated synonym we get nothing:
search_tol("Gomidesia")
But search using higher taxonomy will work:
specimens <- search_tol("Myrtaceae") specimens
To get all these results, we can either increase the limit in the search function:
myrts_all <- search_tol("Myrtaceae", limit=500) myrts_all
Or do paged searching:
myrts1 <- search_tol("Myrtaceae") myrts2 <- request_next(myrts1) myrts2
And we can tidy our results into a dataframe:
tidied <- tidy(myrts_all) tidied
Some information is nested inside the tidied dataframe, but we can get to it by unnesting:
tidied %>% select(id, raw_reads, taxonomy) %>% unnest(col=c(taxonomy, raw_reads), names_sep="_")
The Tree of Life also contains information about the genes captured during sequencing. These can be accessed using the search_tol
function:
genes_all <- search_tol(genes=TRUE, limit=500) tidy(genes_all)
But they cannot currently be queried, so the best bet is just to grab all of them.
Information about a single specimen or gene can be looked up using their ID:
specimen <- lookup_tol("2660") specimen
gene <- lookup_tol("51", type="gene") gene
Records returned by search_tol
and lookup_tol
contain links to data files on an SFTP server. You can load these into R using the load_tol
function. As you saw at the top of this vignette, if you don't provide any URL to load_tol
, it will load the whole Tree of Life tree file.
To load a sequence file for a particular specimen:
load_tol(specimen$fasta_file_url)
To load a sequence file for a gene:
load_tol(gene$fasta_file_url)
Or the alignment file:
load_tol(gene$alignment_file_url)
Or the gene tree:
load_tol(gene$tree_file_url)
All files are returned as strings, so you will need to parse them to use them downstream.
If you want to download these files directly, you can use the download_tol
function.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.