phyloseq
and devtools
packages into R
if you don't already have them: if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("multtest")
BiocManager::install("phyloseq")
if (!requireNamespace("devtools", quietly = TRUE))
install.packages('devtools')
library(devtools)
devtools::install_github("gauravsk/ranacapa")
R
:library(ranacapa)
ranacapa::runRanacapaApp()
Download the ranacapa_automated.R
file from the Anacapa
repo: https://github.com/limey-bean/Anacapa/blob/master/Anacapa_db/scripts/downstream-analyses/ranacapa_automated.R
In the terminal the script runs as:
Rscript ranacapa_automated.R /path/to/input_biom.txt /path/to/input_metadata.txt /path/to/desired_output_directory rarefaction_depth rarefaction_replicates
Last updated 22 April 2018
site_1 site_2 site_3 site_4 site_5 site_6 site_7 site_8 site_9 sum.taxonomy
1 0 0 0 0 0 0 0 0 Annelida;Clitellata;Haplotaxida;Megascolecidae;Amynthas;Amynthas sze…
0 0 0 0 0 0 0 0 0 Nemertea;Palaeonemertea;NA;Cephalothricidae;Cephalothrix;Cephalothri…
0 0 0 0 0 0 0 0 0 ""
0 0 0 0 0 0 0 0 0 Nematoda;Chromadorea;Monhysterida;Monhysteridae;NA;Monhysteridae sp.…
0 1 0 0 0 0 0 0 0 NA;Oomycetes;Pythiales;Pythiaceae;Pythium;Pythium rostratifingens
In some cases, the output file also contains a column called [xxx]_seq_number
that contains a unique sequence identifier for each row. Here's what the file would look like if the sequence number was written by Anacapa:
CO1_seq_number site_1 site_2 site_3 site_4 site_5 site_6 site_7 site_8 site_9 sum.taxonomy
forward_CO1_9136 1 0 0 0 0 0 0 0 0 Annelida;Clitellata;Haplotaxida;Megascolecidae;Amy…
forward_CO1_13513 0 0 0 0 0 0 0 0 0 Nemertea;Palaeonemertea;NA;Cephalothricidae;Cephal…
forward_CO1_14071 0 0 0 0 0 0 0 0 0 ""
forward_CO1_13891 0 0 0 0 0 0 0 0 0 Nematoda;Chromadorea;Monhysterida;Monhysteridae;NA…
forward_CO1_3833 0 1 0 0 0 0 0 0 0 NA;Oomycetes;Pythiales;Pythiaceae;Pythium;Pythium …
Anacapa outputs this information because the user can use this sequence ID to go back to a FASTA file and find the sequence; it is of no use right now to the ranacapa
package. There are functions that exist within ranacapa
that scrub out columns with seq_number
in their name.
Eventually, ranacapa
will also be able to accept qiime-formatted biom tables.
NOTE: As of now there is inconsistency in the terminology for this particular file type. I some times call it the "anacapa out" file and frequently use the variable name ana_out
in functions to deal with it; sometimes, I might refer to it as the "biom table" or "biom file". One goal is to make the terminology more consistent.
ranacapa
cares about is called the "mapping file", which contains the metadata for all of the samples. For example, we might include the season in which the sample was collected, whether it was collected from under an Oak or in the grasslands, etc., in this file. The mapping file has (at least) as many rows of information as there are samples in the study, and one column per "metadata" variable. The first column of the mapping file must match the names of the samples (i.e. all non-taxonomy column names of the biom file. A valid mapping file looks like:sample_name sample_id season texture depth main_plant
site_1 186B1 s sand 10. other
site_2 16A2 w sand 20. other
site_3 201A1 s clay 10. oak
site_4 201C2 s sand 30. sage
site_5 181C1 s sand 30. oak
site_6 PPM6mo p clay 30. other
site_7 181B2 s clay 10. sage
site_8 182C1 s clay 20. other
site_9 15C2 w silt 30. other
Apart from the shiny app, which does all the visualizations, most of ranacapa
functions to convert between files that the users upload (i.e. the Anacapa output file and the associated Metadata file) and the phyloseq
objects that ultimately get used for a lot of the analyses. This converting between formats happens in just a few core functions:
group_anacapa_by_taxonomy()
: sometimes, in anacapa output, the same taxonomic path can appear in multiple rows- for example, if two different sequences had the same taxon identification. This function groups such rows together, and adds up the values in the sample columns. It's important because physeq objects can only have one row per identified taxonomy.
convert_anacapa_to_phyloseq()
This function takes the output file and the mapping file as its inputs, and spits out a phyloseq
class object that has both the taxonomic and the metadata information in the same object. To do this, the function does the folllowing:
A. Convert the table into a phyloseq otu_table
object
otu_table
created in step 1 (the rownames contain the full taxonomic path), and splits it up on each semicolon (;
) to make a new matrix with 6 columns, one for each of the taxonomic ranks. The matrix is then converted into a tax_table
object. otu_table
and tax_table
from steps A and B are united into the phyloseq style object. sample_data
E. The phyloseq object in step C is merged with the sample_data from step D with the phyloseq function merge_phyloseq
.
vegan_otu()
This function just takes a phyloseq style object, e.g. the one made by convert_anacapa_to_phyloseq
, and extracts the otu table in the format of a community matrix that can be read by vegan
functions.
A series of validator functions for the input files, currently only in the development
branch.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.