knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

malaviR

This package is an R interface to the global avian haemosporidian database MalAvi (http://mbio-serv2.mbioekol.lu.se/Malavi/). The package includes functions for downloading data from the MalAvi website directly into your R environment and some basic utility functions for manipulating those data. Furthermore, you can use malaviR to BLAST your haemosporidian DNA sequences against the MalAvi database programmatically, which should facilitate comparisons with MalAvi.

The package also includes a key for linking the host taxonomic classifications in MalAvi with the avian taxonomic classifications found on http://birdtree.org/.

Installation

You can install malaviR from github with:

# install.packages("devtools")
devtools::install_github("vincenzoaellis/malaviR", build_vignettes = TRUE)

Then you can load it in your R session with:

library(malaviR)

The tutorial is accessible by calling the package vignette:

browseVignettes("malaviR")

This will open a web browser window. Click the "HTML" link next to the tutorial name (Getting_Started_with_malaviR) to see the tutorial.

Citation

You can cite malaviR as follows:

citation("malaviR")

Download tables from MalAvi

There are nine tables that can be downloaded from the MalAvi website that summarize the database. Here's how you would download the table linking morphological species to genetic lineages:

morph <- extract_table("Morpho Species Summary")
head(morph) # check it out

The extract_table() help file lists all nine tables you can download directly into R. Or you can specify "all" and get them all as a list.

BLAST a sequence to MalAvi

You can BLAST your own sequences against the MalAvi database using the blast_malavi() function, which leverages the MalAvi website's already existing BLAST capabilities. Your input sequence just needs to be specified as a character string.

## define a sequence. This is the Plasmodium parasite ACAGR1
ACAGR1 <- "GCAACTGGTGCTTCATTTGTATTTATT
TTAACTTATTTACATATTTTAAGAGGATTAAATTATTC
ATATTCATATTTACCTTTATCATGGATATCTGGATTAA
TAATATTTTTAATATCTATAGTAACAGCTTTTATGGGT
TACGTATTACCTTGGGGTCAAATGAGTTTCTGGGGTGC
TACCGTAATAACTAATTTATTATATTTTATACCTGGAC
TAGTTTCATGGATATGTGGTGGATATCTTGTAAGTGAC
CCAACCTTAAAAAGATTCTTTGTACTACATTTTACATT
TCCTTTTATAGCTTTATGTATTGTATTTATACATATAT
TCTTTCTACATTTACAAGGTAGCACAAATCCTTTAGGG
TATGATACAGCTTTAAAAATACCCTTCTATCCAAATCT
TTTAAGTCTTGATATTAAAGGATTTAATAATGTATTAG
TATTATTTTTAGCACAAAGTTTATTTGGAATACT"

## BLAST it against the MalAvi database and save the top five hits to a data frame
hits <- blast_malavi(ACAGR1)
hits # check it out

Download the MalAvi sequence alignments

Using the extract_alignment() function, you can download all of the sequences on MalAvi, or the complete or nearly complete sequences only, or the sequences associated with morphological species only. These alignments will appear as objects of the class DNAbin, defined in the ape package.

## download all sequences
all.seqs <- extract_alignment("all seqs")
all.seqs #check it out

The extract_alignment() help file lists the names of the sequence alignments that the function understands. You can also identify and remove repeated haplotypes (several typically exist due to ambiguous base pairs) in the alignments with the clean_alignment() function.

Clean lineage names from MalAvi sequence alignments

The lineage names in the MalAvi sequence alignments have been modified to include extra information and therefore do not match the lineage names in the rest of the database. The clean_names() function removes the extra information so that the sequence alignment lineage names match those in the rest of the database.

For example, the lineage names in the alignment files look like this:

## lineage names are stored in the rownames of a DNAbin object
six.names <- head(rownames(all.seqs))
six.names

After cleaning those names, they look like this:

clean_names(six.names)

Taxonomic key for host species

The package includes a taxonomic key linking (nearly all) host species names from MalAvi with the species names used in the phylogenetic analysis found on http://birdtree.org/. The latter species names are in a column labeled Jetz.species and have an underscore between the genus and species as they would appear if you were to download trees from the website. The key is stored in a data object called taxonomy which can be called directly once the package is loaded.

## check out taxonomic host key
head(taxonomy)

The taxonomy help file notes when the comparison was made (I will update it occasionally) and lists the multiple MalAvi host species that correspond to single host species in the phylogenetic analysis (there are several).

Identifying sister lineages from a given node

Phylogenetic analysis of the MalAvi lineages is difficult at the moment due to the limited sequence data available for most lineages. Often, the best one can do is to study sister lineages in the phylogeny that are linked by well supported nodes. The function sister_taxa() gives a list of lineages on either side of a specific node in a phylogeny.

## simulate a phylogenetic tree with 10 taxa using the rtree() function in the ape package
library(ape)
tree <- rtree(n=10)

## the node labels of the tree can then be examined
plot(tree)
nodelabels()

## the root of the tree has node label "11" and we can extract sister lineages from the root;
## the sister taxa are grouped into two clades with arbitrary labels of "1" and "2"
sis.tax.df <- sister_taxa(tree, 11)
sis.tax.df # check it out

In general, this function can be used to identify lineages for further analysis or for visualization purposes. For analyses of MalAvi data in particular, you might want to identify all pairs of sister lineages in a phylogeny. This could be done with one line (after calling the dplyr package):

## load dplyr package
library(dplyr)

## identify all pairs of sister lineages in the phylogenetic tree
sister_taxa(tree, 1:tree$Nnode + length(tree$tip.label)) %>% 
  group_by(ancestral.node) %>% 
  mutate(no.lins = length(unique(taxa))) %>% 
  filter(no.lins == 2) %>% 
  select(-no.lins) %>% 
  as.data.frame

Reporting Issues

Please report any issues with the package here.



vincenzoaellis/malaviR documentation built on Oct. 10, 2019, 10:55 p.m.