mctoolsr is an R package developed to facilitate microbial community analyses. The current functions are meant to handle an input taxa (OTU) table in either biom or tab-delimited format and help streamline common (and more specialized) downstream analyses. It is under active development, so please submit bug reports and feature requests as indicated below.

This document serves as a brief introduction to using mctoolsr. This document will go through getting the package working and a few examples using the most popular functions.

Getting and using mctoolsr

mctoolsr is available on Github at: https://github.com/leffj/mctoolsr

To use:

install.packages("devtools")
devtools::install_github("leffj/mctoolsr")

To use in a script, load and attach the package using:

library(mctoolsr)
devtools::load_all(".")

Updating mctoolsr

mctoolsr is under active devlopment. Update regularly to use the latest features and for bug fixes. You will have to load and attach as above after updating.

devtools::install_github("leffj/mctoolsr")

To report bugs or request features, go to:

https://github.com/leffj/mctoolsr/issues

Examples

Note that the following examples use an example dataset taken from a study examining the bacterial communities associated with fruits and vegetables (Leff et al. 2013). You can find this dataset in the mctoolsr/examples directory.

Loading OTU tables and metadata

You can load a taxa (i.e. OTU) table in biom format using the following approach. Note that you don't need to use system.file() when using your own filepaths -- just use the filepath directly. This function just helps your system find the example files.

One of the nice things about loading your data this way is that all the sample IDs will be matched between your taxon table and metadata so that they will be in the same order and any sample IDs not present in one or the other will be dropped.

You can optionally filter out samples of a specific type during this step, but this can also be done separately as shown here.

tax_table_fp = system.file('extdata', 'fruits_veggies_taxa_table_wTax.biom', 
                     package = 'mctoolsr')
map_fp = system.file('extdata', 'fruits_veggies_metadata.txt', 
                     package = 'mctoolsr')
input = load_taxa_table(tax_table_fp, map_fp)

The loaded data will consist of three parts:

  1. The taxon table itself: "data_loaded"
  2. The metadata: "map_loaded"
  3. The taxonomic classifiers (if provided in the biom file): "taxonomy_loaded"

Any of these components can be quickly accessed using the '$' sign notation as shown in the next example.

Returning numbers of sequences per sample

This can be achieved simply by calculating column sums on the taxon table:

sort(colSums(input$data_loaded))

Rarefying

As you can see from the previous example, we can rarefy (i.e. normalize for variable sequence depths) to 1000 sequences per sample without losing any samples. This can be done using the following command:

input_rar = single_rarefy(input, 1000)
colSums(input_rar$data_loaded)

Summarize taxonomic relative abundances at a higher taxonomic level

It is useful to get a feel for the taxonomic composition of your samples early on in the exploratory data analysis process. This can quickly be done by calculating taxonomic summaries at higher taxonomic levels - in this case at the phylum level. The values represent the sum of all the relative abundances for OTUs classified as belonging to the indicated phylum. In this example just the first few phyla and samples are shown.

tax_sum_phyla = summarize_taxonomy(input_rar, level = 2, report_higher_tax = FALSE)
tax_sum_phyla[1:5, 1:8]

You can also quickly show differences in the relative abundances of taxanomic groups using a heatmap as shown below. We can easily see that mushrooms have a much lower relative abundance of Enterobacteriaceae than the other sample types.

tax_sum_families = summarize_taxonomy(input_rar, level = 5, report_higher_tax = FALSE)
plot_ts_heatmap(tax_sum_families, input_rar$map_loaded, 0.01, 'Sample_type', custom_sample_order = c('Lettuce', 'Spinach', 'Strawberries', 'Mushrooms'))

Calculating a dissimilarity matrix

For dissimilarity-based analyses such as ordinations and PERMANOVA, it is necessary to calculate a dissimilarity matrix. There is currently support for Bray-Curtis dissimilarities based on square-root transformed data. This is a widely used dissimilarity metric for these analyses, but others will be added as requested.

dm = calc_dm(input_rar$data_loaded)

Plotting an ordination

There are two ways to plot ordinations in mctoolsr. The multistep way is shown here, but there is also a shortcut using the plot_nmds() function.

ord = calc_ordination(dm, 'nmds')
plot_ordination(input_rar, ord, 'Sample_type', 'Farm_type', hulls = TRUE)

Filtering samples

It is easy to filter samples from your dataset in mctoolsr. You can specify to remove samples meeting a specified condition in the metadata or keep those samples. In the example below, lettuce samples are removed, and the ordination is plotted again.

input_rar_filt = filter_data(input_rar, 'Sample_type', filter_vals = 'Lettuce')
dm = calc_dm(input_rar_filt$data_loaded)
ord = calc_ordination(dm, 'nmds')
plot_ordination(input_rar_filt, ord, 'Sample_type', 'Farm_type', hulls = TRUE)

Filtering taxa

There are multiple taxa filtering options in mctoolsr. This example shows how to explore the proteobacteria sequences across the samples. Taxa can also be filtered based on their relative abundance.

input_proteobact = filter_taxa_from_input(input, taxa_to_keep = 'p__Proteobacteria')
sort(colSums(input_proteobact$data_loaded))
input_proteobact_rar = single_rarefy(input_proteobact, 219)
plot_nmds(calc_dm(input_proteobact_rar$data_loaded), metadata_map = input_proteobact_rar$map_loaded, 
          color_cat = 'Sample_type')

Taxa based exploration

It is often useful to determine the taxa driving differences between the community compositions of different sample types. This example shows one way to do this to determine taxa driving differences between sample types.

tax_sum_families = summarize_taxonomy(input_rar_filt, level = 5, report_higher_tax = FALSE)
taxa_summary_by_sample_type(tax_sum_families, input_rar_filt$map_loaded, 
                            type_header = 'Sample_type', filter_level = 0.05, test_type = 'KW')

This analysis demonstrates that Pseudomonadaceae and Sphingobacteriaceae tend to have higher relative abundances on mushrooms than spinach and strawberries. The p values are based on Kruskal-Wallis tests and two different corrections are reported to deal with the multiple comparisons (Bonferroni and FDR). Rare taxa are filtered out using the filter_level peramter. The values indicated under the sample types are mean relative abundances.

Calculating mean dissimilarities

Sometimes it is necessary to calculate mean dissimilarities. This is important in cases where sample types are pseudoreplecated. This is not the case here, but this example demonstrates this functionality.

dm = calc_dm(input_rar_filt$data_loaded)
dm_aggregated = calc_mean_dissimilarities(dm, input_rar_filt$map_loaded, 
                                          'Sample_Farming', return_map = TRUE)
ord = calc_ordination(dm_aggregated$dm, ord_type = 'nmds')
plot_ordination(dm_aggregated, ord, color_cat = 'Sample_type')

Exporting an OTU table

OTU tables that have been edited in *mctoolsr can be exported in text (tab delimited) format for later use. Use this function:

export_otu_table(input_rar_filt, "export/path/otu_table_export.txt")


leffj/mctoolsr documentation built on Aug. 5, 2022, 1:27 a.m.