mctoolsr is an R package developed to facilitate microbial community analyses. The current functions are meant to handle an input taxa (OTU) table in either biom or tab-delimited format and help streamline common (and more specialized) downstream analyses. It is under active development, so please submit bug reports and feature requests as indicated below.
This document serves as a brief introduction to using mctoolsr. This document will go through getting the package working and a few examples using the most popular functions.
mctoolsr is available on Github at: https://github.com/leffj/mctoolsr
To use:
install.packages("devtools") devtools::install_github("leffj/mctoolsr")
To use in a script, load and attach the package using:
library(mctoolsr)
devtools::load_all(".")
mctoolsr is under active devlopment. Update regularly to use the latest features and for bug fixes. You will have to load and attach as above after updating.
devtools::install_github("leffj/mctoolsr")
https://github.com/leffj/mctoolsr/issues
Note that the following examples use an example dataset taken from a study examining the bacterial communities associated with fruits and vegetables (Leff et al. 2013). You can find this dataset in the mctoolsr/examples directory.
You can load a taxa (i.e. OTU) table in biom format using the following approach. Note that you don't need to use system.file()
when using your own filepaths -- just use the filepath directly. This function just helps your system find the example files.
One of the nice things about loading your data this way is that all the sample IDs will be matched between your taxon table and metadata so that they will be in the same order and any sample IDs not present in one or the other will be dropped.
You can optionally filter out samples of a specific type during this step, but this can also be done separately as shown here.
tax_table_fp = system.file('extdata', 'fruits_veggies_taxa_table_wTax.biom', package = 'mctoolsr') map_fp = system.file('extdata', 'fruits_veggies_metadata.txt', package = 'mctoolsr') input = load_taxa_table(tax_table_fp, map_fp)
The loaded data will consist of three parts:
Any of these components can be quickly accessed using the '$' sign notation as shown in the next example.
This can be achieved simply by calculating column sums on the taxon table:
sort(colSums(input$data_loaded))
As you can see from the previous example, we can rarefy (i.e. normalize for variable sequence depths) to 1000 sequences per sample without losing any samples. This can be done using the following command:
input_rar = single_rarefy(input, 1000) colSums(input_rar$data_loaded)
It is useful to get a feel for the taxonomic composition of your samples early on in the exploratory data analysis process. This can quickly be done by calculating taxonomic summaries at higher taxonomic levels - in this case at the phylum level. The values represent the sum of all the relative abundances for OTUs classified as belonging to the indicated phylum. In this example just the first few phyla and samples are shown.
tax_sum_phyla = summarize_taxonomy(input_rar, level = 2, report_higher_tax = FALSE) tax_sum_phyla[1:5, 1:8]
You can also quickly show differences in the relative abundances of taxanomic groups using a heatmap as shown below. We can easily see that mushrooms have a much lower relative abundance of Enterobacteriaceae than the other sample types.
tax_sum_families = summarize_taxonomy(input_rar, level = 5, report_higher_tax = FALSE) plot_ts_heatmap(tax_sum_families, input_rar$map_loaded, 0.01, 'Sample_type', custom_sample_order = c('Lettuce', 'Spinach', 'Strawberries', 'Mushrooms'))
For dissimilarity-based analyses such as ordinations and PERMANOVA, it is necessary to calculate a dissimilarity matrix. There is currently support for Bray-Curtis dissimilarities based on square-root transformed data. This is a widely used dissimilarity metric for these analyses, but others will be added as requested.
dm = calc_dm(input_rar$data_loaded)
There are two ways to plot ordinations in mctoolsr. The multistep way is shown here, but there is also a shortcut using the plot_nmds()
function.
ord = calc_ordination(dm, 'nmds') plot_ordination(input_rar, ord, 'Sample_type', 'Farm_type', hulls = TRUE)
It is easy to filter samples from your dataset in mctoolsr. You can specify to remove samples meeting a specified condition in the metadata or keep those samples. In the example below, lettuce samples are removed, and the ordination is plotted again.
input_rar_filt = filter_data(input_rar, 'Sample_type', filter_vals = 'Lettuce') dm = calc_dm(input_rar_filt$data_loaded) ord = calc_ordination(dm, 'nmds') plot_ordination(input_rar_filt, ord, 'Sample_type', 'Farm_type', hulls = TRUE)
There are multiple taxa filtering options in mctoolsr. This example shows how to explore the proteobacteria sequences across the samples. Taxa can also be filtered based on their relative abundance.
input_proteobact = filter_taxa_from_input(input, taxa_to_keep = 'p__Proteobacteria') sort(colSums(input_proteobact$data_loaded)) input_proteobact_rar = single_rarefy(input_proteobact, 219) plot_nmds(calc_dm(input_proteobact_rar$data_loaded), metadata_map = input_proteobact_rar$map_loaded, color_cat = 'Sample_type')
It is often useful to determine the taxa driving differences between the community compositions of different sample types. This example shows one way to do this to determine taxa driving differences between sample types.
tax_sum_families = summarize_taxonomy(input_rar_filt, level = 5, report_higher_tax = FALSE) taxa_summary_by_sample_type(tax_sum_families, input_rar_filt$map_loaded, type_header = 'Sample_type', filter_level = 0.05, test_type = 'KW')
This analysis demonstrates that Pseudomonadaceae and Sphingobacteriaceae tend to have higher relative abundances on mushrooms than spinach and strawberries. The p values are based on Kruskal-Wallis tests and two different corrections are reported to deal with the multiple comparisons (Bonferroni and FDR). Rare taxa are filtered out using the filter_level
peramter. The values indicated under the sample types are mean relative abundances.
Sometimes it is necessary to calculate mean dissimilarities. This is important in cases where sample types are pseudoreplecated. This is not the case here, but this example demonstrates this functionality.
dm = calc_dm(input_rar_filt$data_loaded) dm_aggregated = calc_mean_dissimilarities(dm, input_rar_filt$map_loaded, 'Sample_Farming', return_map = TRUE) ord = calc_ordination(dm_aggregated$dm, ord_type = 'nmds') plot_ordination(dm_aggregated, ord, color_cat = 'Sample_type')
OTU tables that have been edited in *mctoolsr can be exported in text (tab delimited) format for later use. Use this function:
export_otu_table(input_rar_filt, "export/path/otu_table_export.txt")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.