In wrightaprilm/treesiftr: Visualizing The Relationship Between Phylogeny and Data

logo

Introduction

Estimating phylogenetic trees is crucial in many areas of evolutionary biology. However, visualizing the relationship between data and trees is not intutive. To assist with visualizing this relationship, I have created treesiftR, an a Shiny application [@shiny] that can be run locally or used via the web that takes subsets of data from a phylogenetic matrix, generates a tree under parsimony, and scores that tree under both the likelihood and parsimony criteria. The output of the package is a visualization or set of visualizations of a tree and characters. treesiftR can also be used as an [@R] package to provide a visual demonstration of the relationship between trees and data, while enforcing concepts in R programming.

If you are interested in treesiftr, take a look at my instructor's guide for more information (PDF also available here) about adopting this module.

Target Audience

treesiftr has been used in the Analytical Paleobiology Workshop, in which the audience was graduate students and postdocs, many of whom had no prior knowledge of phylogeny. It is also used in the Genetics course at Southeastern Louisiana University, where the audience is undergraduates who have no prior knowledge of phylogeny. It is meant to be accompanied by lecture material on phylogenetics. A glossary is provided with each worksheet, and a sample slide deck is included in the inst/slides directory.

Installation

Currently, treesiftr can be installed via the devtools install_github function [@devtools].

devtools::install_github("wrightaprilm/treesiftr")

Required Packages

knitr::opts_chunk$set(
    message = FALSE,
    warning = FALSE,
    include = FALSE
)
library(ape)
library(treesiftr)
library(phangorn)
library(alignfigR)
library(ggtree)
library(ggplot2)
data(bears)

Note: phangorn, alignfigr, and ggplot2 are all available via CRAN. ggtree is available via bioconductor. For information on installing bioconductor packages, see here.

Operation

The first step to making a treesiftr visualization is to select the subset of the phylogenetic matrix that we would like to visualize. This is performed via a function called generate_sliding. The below command will subset the

# Locate package data
fdir <- system.file("extdata", package = "treesiftr")
aln_path <- file.path(fdir, "bears_fasta.fa")
bears <- read_alignment(aln_path)
tree <- read.tree(file.path(fdir, "starting_tree.tre"))

# Generate our list of dataframe subsets. For simplicity, we will look at one 
# set of characters

sample_df <- generate_sliding(bears, start_char = 1, stop_char = 1, steps = 1)

The result of this is a dataframe, shown below:

sample_df

This dataframe dispays the start character (the first character that will be visualized) and stop character (the final character that will be visualized).

We can then build a tree from our data subset:

output_tree <- generate_tree_vis(sample_df = sample_df, alignment =                                                     aln_path, tree = tree, phy_mat = bears)
output_tree

Phangorn [@Schliep2011, Schliep2017] requires a starting tree to estimate a parsimony tree. We specify the tree we read in earlier for this purpose. The trees, which were generated with ggtree [@ggtree], a ggplot2 [@ggplot2] library for phylogenies, have been saved to a vector, which can be displayed in its entirety, or subsetted to look at specific trees.

Random Trees

You can also map the characters to a random tree, and count the parsimony steps or score the characters under likelihood on this tree.

random_tree <- generate_tree_vis(sample_df = sample_df, alignment =                                                    aln_path, tree = tree, phy_mat = bears,
                    random_tree = TRUE, pscore = TRUE)
random_tree