tidy_micro: A function to merge multiple OTU tables and clinical data...

Description Usage Arguments Details Value Author(s) Examples

View source: R/tidy_micro.R

Description

A function to take any number of OTU tables (or other sequencing data tables), calculate taxa prevalence, relative abundance, and a CLR transformation, and finally merges clinical data

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
tidy_micro(
  otu_tabs,
  clinical,
  tab_names,
  prev_cutoff = 0,
  ra_cutoff = 0,
  exclude_taxa = NULL,
  library_name = "Lib",
  complete_clinical = TRUE,
  filter_summary = TRUE,
  count_summary = TRUE
)

Arguments

otu_tabs

A single table or list of metagenomic sequencing data. Tables should have a first column of OTU Names and following columns of OTU counts. Column names should be sequencing library names

clinical

Sequencing level clinical data. Must have a column with unique names for library (sequencing ID)

tab_names

names for otu_tabs. These will become the "Tables" column. It is also an option to simply name the OTU tables in the list supplied to otu_tabs

prev_cutoff

A prevalence cutoff where *X* percent of libraries must have this taxa or it will be included in the "Other" category

ra_cutoff

A relative abundance (RA) cutoff where at least one library must have a RA above the cutoff or the taxa will be included in the "Other" category

exclude_taxa

A character vector used to specify any taxa that you would like to included in the "Other" category. Taxa specified will be included in "Other" for every OTU table provided

library_name

The column name containing sequencing library names. Should match with column names of supplied OTU tables (after first column)

complete_clinical

Logical; only include columns from OTU tables who's library name is in clinical data

filter_summary

Logical; print out summaries of filtering steps. Ignored prev_cutoff, ra_cutoff, and exclude_taxa are all left as default values

count_summary

Logical: print out summary of unique library names and sequencing depth

Details

Column names of the OTU tables must be the same for each table, and these should be the the library names inside of your clinical. Please see the vignette for a detailed description.

The CLR transformation adds (1 / sequencing depth) to each OTU count for each library before centering and log transforming in order to avoid issues with 0 counts.

The list of OTU tables are split, manipulated, and stacked into a data frame using the ldply function from the plyr package. Names of OTU tables supplied will be the name of their "Table" in the final tidy_micro set

Value

A data.frame in the tidy_micro format

Author(s)

Charlie Carpenter

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
data(bpd_phy); data(bpd_cla); data(bpd_ord); data(bpd_fam); data(bpd_clin)

## Multiple OTU tables with named list
otu_tabs = list(Phylum = bpd_phy, Class = bpd_cla,
Order = bpd_ord, Family = bpd_fam)
set <- tidy_micro(otu_tabs = otu_tabs, clinical = bpd_clin)

## Multiple OTU tables with unnamed list
unnamed_tabs <- list(bpd_phy, bpd_cla, bpd_ord, bpd_fam)
set <- tidy_micro(otu_tabs = unnamed_tabs,
tab_names = c("Phylum", "Class", "Order", "Family"), clinical = bpd_clin)

## Single OTU table
set <- tidy_micro(otu_tabs = bpd_cla, clinical = bpd_clin, tab_names = "Class")

## Filtering out low abundance or uninteresting taxa right away
## WARNING: Only do this if you do not want to calculate alpha diversities with this tidy_micro set

filter_set <- tidy_micro(otu_tabs = otu_tabs, clinical = bpd_clin,
              prev_cutoff = 5, ## 5% of libraries must have this bug, or it is filtered
              ra_cutoff = 1, ## At least 1 libraries must have RA of 1, or it is filtered
              exclude_taxa = c("Unclassified", "Bacteria") ## Unclassified taxa we don't want
              )

tidyMicro documentation built on Jan. 13, 2021, 6:18 a.m.