In wrightaprilm/treesiftr: Visualizing The Relationship Between Phylogeny and Data

Introduction

treesiftr is a Shiny application [@shiny] for visualizing the relationship between phylogenetic trees and phylogenetic data. Phylogenetic trees are crucial to the study of comparative biology, taxonomy, and evolution. However, understanding how to read a phylogenetic tree, and how a phylogenetic tree relates to underlying phylogenetic data, remains challenging.

In today's lab exercise, we will learn about a phylogenetic matrix, and then use the data in the matrix to visualize a phylogenetic tree.

Tip

Linked text goes to the glossary. If you see a term you don't recognize, remember you can refresh your memory at the bottom of this worksheet!

The fossil bear matrix

The data matrix included with treesiftr is a matrix of binary ("0" and "1") characters compiled to estimate a topology of living and extinct bear species [@abella12]. This matrix is fairly typical in size for a paleontological matrix, comprising 62 characters. It is, however, atypically complete, with only 18% missing data. In the following exercises, missing data will be represented by a thin black line. The "0" state will be represented in pale blue, and the "1" in brown.

treesiftr

Navigate to the treesiftr application.

treesiftr works by subsetting a phylogenetic matrix using the start and step arguments. These are found on the left-hand side of the screen. The start argument controls where in the matrix you would like to begin visualizing characters. For example, a start value of 1 would indicate to begin visualizing characters from the first character in the matrix. The step value indicates how many characters at once to visualize. A step value of three would indicate characters should be viewed in threes. For example, if start = 1 and step = 3, the visualization will show characters 1, 2, and 3. The start and end points of character visualizations are noted in the upper left-hand corner of the visualization.

There are also two checkboxes. Leave these unchecked until otherwise noted.

treesiftr works by subsetting a phylogenetic matrix by user input. Then, a parsimony tree is constructed in Phangorn [@Schliep2011, Schliep2017] from the user-defined subset. The tree is scored under both parsimony and Lewis' Mk model [@Lewis2001] for discrete character data. The data and tree are then visualized using ggtree [@ggtree], based upon the ggplot2 package [@ggplot2]. This application makes use of Shiny to provide a graphical interface, but there is a second included tutorial for more experienced users of the R statistical language.

Questions

Visualize characters 1 and 2. What is the parsimony score for this character set? Click "Do you want to print the parsimony score?" in the interface to check your answer. Uncheck the parsimony score option when you have completed this question.
Visualize characters 1 and 2. This time, click the "View random set of trees rather than an estimated tree" button in the interface. Are the parsimony scores larger or smaller than those of the character on an estimated tree? Does this make sense to you? Why? Remember to uncheck this box before continuing.
Visualize character set 2-3 and 3-4. What monophyletic group from the tree for characters 2-3 is no longer on the tree for characters 3-4?
View a tree of characters 8 - 10. Which character, 8, 9 or 10, represents a reversal?
What information would we need to decide if the "1" state possesed by Zaragocyon_daamsi in character 52 is an autapomorphy?
Click the switch that says "Do you want to print the likelihood score under the Mk model?" and the switch that says "Do you want to print the parsimony score?". Does the character set 8, 9, and 10 have the same parsimony score as the character set 9, 10, and 11? How about the same likelihood score?
Compare characters 46-49 and 47-50. Why does set 47-50 have a better likelihood than 46-49?
What is the relationship between the likelihood score and increasing the number of characters visualized?
What is the minimum number adding a parsimony-informative character can add to the parsimony score?
These trees are fully resolved. Based on your exploration of the data, does this degree of resolution make sense? Take a look at characters 1-3, for example. Does displaying a fully resolved tree make sense for these characters?

Glossary

Ancestral State: A character state possessed by the ancestor of a group

Autapomorphy: A character state that is unique to a specific taxon.

Derived State: A character state that is different from the ancestral state.

Likelihood Score: The likelihood of the observed data under a specific model.

Maximum likelihood: A phylogenetic optimatlity criterion under which phylogenetic data are modeled according to sets of assumptions. Under this criterion, the tree that has the best ("maximum") likelihood score under the assumed model is to be preferred.

Maximum parsimony: A phylogenetic optimality criterion. This criterion holds that the tree implying the fewest changes in the characters used to generate it should be preferred.

Monophyletic: A group on a phylogeny of an ancestor and all of its descendents.

Parsimony Score: The number of changes implied by a character on a tree.

Resolved: A node that is bifucating, leaving two descendent lineages.

Reversal: A change from the derived state back to the ancestral state.