View source: R/data_refinement.R
refine_data | R Documentation |
Removes metadata from OTU table and cuts off the least abundant species, defined by the cutoff parameter
refine_data( OTU_table, abundance_cutoff = 0, cutoff_type = "mean", raw_value_cutoff = TRUE, renormalize = FALSE, metadataCols = c("OTU Id", "taxonomy") )
OTU_table |
The raw OTU table, either as a |
abundance_cutoff |
Numeric, the threshold cutoff value. If it is |
cutoff_type |
The type of measure to base the cutoff on. Can be any
of |
raw_value_cutoff |
Logical, should filtering be based on the raw abundances? If not, the sample-wise relative abundances are used for filtering. Note that this parameter does not determine whether the results of the function are relative abundances. |
renormalize |
Logical, should the abundances be renormalized (sample-wise) after the procedure? |
metadataCols |
The names (character vector) or position (integer) of the metadata columns to remove from the table |
In order for an OTU-table to be valid, the following criteria must hold:
The data points (sample) are in columns, the abundances for each OTU is in rows.
The rows may only hold OTU abundances
There may be as many metadata colums as preferable. However, the all
need to be declared in the metadataCols
argument and the column
taxonomy
has be there in order for the output file to contain the
taxonomy.
The row names of the table are the OTU names and the column names are the sample names
Of course, this does not apply to the case when a phyloseq
object is provided as everything
will be handled automatically
Tibbles are troublesome in this context as the do not support rownames. In order to obtain the OTU IDs, the function looks through the following in order and proceed with the next until successfull:
If the original OTU table has rownames, they are used as the OTU IDs
If one (the first) of c('OTU Id','#OTU ID','OTU ID','OTU_ID')
reside in the colnames of the OTU table,
the corresponding column is used as the OTU IDs
If metadataCols
is not empty, the first of the specified columns regarded as the OTU ID (gives warning)
The numeric row indicies are treated as the OTU IDs (gives warning)
A data frame (always a base data frame even though a tibble is supplied) with the metadata columns removed, and the OTUs below the cutoff are filtered away. Additionally, in the refined table the OTUs constitute the rows, while the rows are the samples (transposed compared to the original OTU table).
library(micInt) data("seawater") refine_data(seawater,abundance_cutoff = 1e-3,cutoff_type = "max")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.