knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
This package is an R
interface to the global avian haemosporidian database MalAvi (http://mbio-serv2.mbioekol.lu.se/Malavi/). The package includes functions for downloading data from the MalAvi website directly into your R
environment and some basic utility functions for manipulating those data. Furthermore, you can use malaviR
to BLAST your haemosporidian DNA sequences against the MalAvi database programmatically, which should facilitate comparisons with MalAvi.
The package also includes a key for linking the host taxonomic classifications in MalAvi with the avian taxonomic classifications found on http://birdtree.org/.
You can install malaviR
from github with:
# install.packages("devtools") devtools::install_github("vincenzoaellis/malaviR", build_vignettes = TRUE)
Then you can load it in your R
session with:
library(malaviR)
The tutorial is accessible by calling the package vignette:
browseVignettes("malaviR")
This will open a web browser window. Click the "HTML" link next to the tutorial name (Getting_Started_with_malaviR
) to see the tutorial.
You can cite malaviR
as follows:
citation("malaviR")
There are nine tables that can be downloaded from the MalAvi website that summarize the database. Here's how you would download the table linking morphological species to genetic lineages:
morph <- extract_table("Morpho Species Summary") head(morph) # check it out
The extract_table()
help file lists all nine tables you can download directly into R
. Or you can specify "all"
and get them all as a list.
You can BLAST your own sequences against the MalAvi database using the blast_malavi()
function, which leverages the MalAvi website's already existing BLAST capabilities. Your input sequence just needs to be specified as a character string.
## define a sequence. This is the Plasmodium parasite ACAGR1 ACAGR1 <- "GCAACTGGTGCTTCATTTGTATTTATT TTAACTTATTTACATATTTTAAGAGGATTAAATTATTC ATATTCATATTTACCTTTATCATGGATATCTGGATTAA TAATATTTTTAATATCTATAGTAACAGCTTTTATGGGT TACGTATTACCTTGGGGTCAAATGAGTTTCTGGGGTGC TACCGTAATAACTAATTTATTATATTTTATACCTGGAC TAGTTTCATGGATATGTGGTGGATATCTTGTAAGTGAC CCAACCTTAAAAAGATTCTTTGTACTACATTTTACATT TCCTTTTATAGCTTTATGTATTGTATTTATACATATAT TCTTTCTACATTTACAAGGTAGCACAAATCCTTTAGGG TATGATACAGCTTTAAAAATACCCTTCTATCCAAATCT TTTAAGTCTTGATATTAAAGGATTTAATAATGTATTAG TATTATTTTTAGCACAAAGTTTATTTGGAATACT" ## BLAST it against the MalAvi database and save the top five hits to a data frame hits <- blast_malavi(ACAGR1) hits # check it out
Using the extract_alignment()
function, you can download all of the sequences on MalAvi, or the complete or nearly complete sequences only, or the sequences associated with morphological species only. These alignments will appear as objects of the class DNAbin
, defined in the ape
package.
## download all sequences all.seqs <- extract_alignment("all seqs") all.seqs #check it out
The extract_alignment()
help file lists the names of the sequence alignments that the function understands. You can also identify and remove repeated haplotypes (several typically exist due to ambiguous base pairs) in the alignments with the clean_alignment()
function.
The lineage names in the MalAvi sequence alignments have been modified to include extra information and therefore do not match the lineage names in the rest of the database. The clean_names()
function removes the extra information so that the sequence alignment lineage names match those in the rest of the database.
For example, the lineage names in the alignment files look like this:
## lineage names are stored in the rownames of a DNAbin object six.names <- head(rownames(all.seqs)) six.names
After cleaning those names, they look like this:
clean_names(six.names)
The package includes a taxonomic key linking (nearly all) host species names from MalAvi with the species names used in the phylogenetic analysis found on http://birdtree.org/. The latter species names are in a column labeled Jetz.species
and have an underscore between the genus and species as they would appear if you were to download trees from the website. The key is stored in a data object called taxonomy
which can be called directly once the package is loaded.
## check out taxonomic host key head(taxonomy)
The taxonomy
help file notes when the comparison was made (I will update it occasionally) and lists the multiple MalAvi host species that correspond to single host species in the phylogenetic analysis (there are several).
Phylogenetic analysis of the MalAvi lineages is difficult at the moment due to the limited sequence data available for most lineages. Often, the best one can do is to study sister lineages in the phylogeny that are linked by well supported nodes. The function sister_taxa()
gives a list of lineages on either side of a specific node in a phylogeny.
## simulate a phylogenetic tree with 10 taxa using the rtree() function in the ape package library(ape) tree <- rtree(n=10) ## the node labels of the tree can then be examined plot(tree) nodelabels() ## the root of the tree has node label "11" and we can extract sister lineages from the root; ## the sister taxa are grouped into two clades with arbitrary labels of "1" and "2" sis.tax.df <- sister_taxa(tree, 11) sis.tax.df # check it out
In general, this function can be used to identify lineages for further analysis or for visualization purposes. For analyses of MalAvi data in particular, you might want to identify all pairs of sister lineages in a phylogeny. This could be done with one line (after calling the dplyr
package):
## load dplyr package library(dplyr) ## identify all pairs of sister lineages in the phylogenetic tree sister_taxa(tree, 1:tree$Nnode + length(tree$tip.label)) %>% group_by(ancestral.node) %>% mutate(no.lins = length(unique(taxa))) %>% filter(no.lins == 2) %>% select(-no.lins) %>% as.data.frame
Please report any issues with the package here.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.