refine_data: Refine raw OTU table

View source: R/data_refinement.R

refine_dataR Documentation

Refine raw OTU table

Description

Removes metadata from OTU table and cuts off the least abundant species, defined by the cutoff parameter

Usage

refine_data(
  OTU_table,
  abundance_cutoff = 0,
  cutoff_type = "mean",
  raw_value_cutoff = TRUE,
  renormalize = FALSE,
  metadataCols = c("OTU Id", "taxonomy")
)

Arguments

OTU_table

The raw OTU table, either as a data.frame, a matrix or a phyloseq object

abundance_cutoff

Numeric, the threshold cutoff value. If it is NULL, the filtering process is skipped.

cutoff_type

The type of measure to base the cutoff on. Can be any of 'mean', median, max which cuts away OTUs based on mean, median and maximum abundance, repectivly

raw_value_cutoff

Logical, should filtering be based on the raw abundances? If not, the sample-wise relative abundances are used for filtering. Note that this parameter does not determine whether the results of the function are relative abundances.

renormalize

Logical, should the abundances be renormalized (sample-wise) after the procedure?

metadataCols

The names (character vector) or position (integer) of the metadata columns to remove from the table

Details

Critera for OTU tables

In order for an OTU-table to be valid, the following criteria must hold:

  • The data points (sample) are in columns, the abundances for each OTU is in rows.

  • The rows may only hold OTU abundances

  • There may be as many metadata colums as preferable. However, the all need to be declared in the metadataCols argument and the column taxonomy has be there in order for the output file to contain the taxonomy.

  • The row names of the table are the OTU names and the column names are the sample names

Of course, this does not apply to the case when a phyloseq object is provided as everything will be handled automatically

Tibbles

Tibbles are troublesome in this context as the do not support rownames. In order to obtain the OTU IDs, the function looks through the following in order and proceed with the next until successfull:

  1. If the original OTU table has rownames, they are used as the OTU IDs

  2. If one (the first) of c('OTU Id','#OTU ID','OTU ID','OTU_ID') reside in the colnames of the OTU table, the corresponding column is used as the OTU IDs

  3. If metadataCols is not empty, the first of the specified columns regarded as the OTU ID (gives warning)

  4. The numeric row indicies are treated as the OTU IDs (gives warning)

Value

A data frame (always a base data frame even though a tibble is supplied) with the metadata columns removed, and the OTUs below the cutoff are filtered away. Additionally, in the refined table the OTUs constitute the rows, while the rows are the samples (transposed compared to the original OTU table).

Examples

library(micInt)
data("seawater")
refine_data(seawater,abundance_cutoff = 1e-3,cutoff_type = "max")



AlmaasLab/micInt documentation built on April 1, 2022, 10:37 a.m.