omics | R Documentation |
This is the abstract class 'omics', contains a variety of methods that are inherited and applied in the omics classes: metagenomics, proteomics and metabolomics.
Every class is created with the R6Class method. Methods are either public or private, and only the public components are inherited by other omic classes. The omics class by default uses a sparseMatrix and data.table data structures for quick and efficient data manipulation and returns the object by reference, same as the R6 class. The method by reference is very efficient when dealing with big data.
A list of components:
div
A data.frame from diversity.
stats
A pairwise statistics from pairwise_wilcox_test.
plot
A ggplot object.
A list of components:
data
A data.table of feature compositions.
palette
A setNames palette from colormap.
A list of components:
distmat
A distance dissimilarity in matrix format.
stats
A statistical test as a data.frame.
pcs
principal components as a data.frame.
scree_plot
A ggplot object.
anova_plot
A ggplot object.
scores_plot
A ggplot object.
dfe
A long data.table table.
volcano_plot
A ggplot object.
countData
A path to an existing file, data.table or data.frame.
featureData
A path to an existing file, data.table or data.frame.
metaData
A path to an existing file, data.table or data.frame.
.valid_schema
Boolean value for schema validation via JSON
.feature_id
A character, default name for the feature identifiers.
.sample_id
A character, default name for the sample identifiers.
.samplepair_id
A character, default name for the sample pair identifiers.
new()
Wrapper function that is inherited and adapted for each omics class.
The omics classes requires a metadata samplesheet, that is validated by the metadata_schema.json.
It requires a column SAMPLE_ID
and optionally a SAMPLEPAIR_ID
or FEATURE_ID
can be supplied.
The SAMPLE_ID
will be used to link the metaData to the countData, and will act as the key during subsetting of other columns.
To create a new object use new()
method. Do notice that the abstract class only checks if the metadata is valid!
The countData
and featureData
will not be checked, these are handles by the sub-classes.
Using the omics class to load your data is not supported and still experimental.
omics$new(countData = NULL, featureData = NULL, metaData = NULL)
countData
A path to an existing file, data.table or data.frame.
featureData
A path to an existing file, data.table or data.frame.
metaData
A path to an existing file, data.table or data.frame.
A new omics
object.
validate()
Validates an input metadata against the JSON schema. The metadata should look as follows and should not contain any empty spaces.
For example; 'sample 1'
is not allowed, whereas 'sample1'
is allowed!
Acceptable column headers:
SAMPLE_ID (required)
SAMPLEPAIR_ID (optional)
FEATURE_ID (optional)
CONTRAST_ (optional), used for autoFlow()
.
VARIABLE_ (optional), not supported yet.
This function is used during the creation of a new object via new()
to validate the supplied metadata
via a filepath or existing data.table or data.frame.
omics$validate()
None
removeZeros()
Removes empty (zero) values by row and column from the countData
.
This method is performed automatically during subsetting of the object.
omics$removeZeros()
object in place
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE) obj <- readRDS(obj_path) obj$removeZeros()
removeNAs()
Remove NAs from metaData
and updates the countData
.
omics$removeNAs(column)
column
The column from where NAs should be removed, this can be either a wholenumbers or characters. Vectors are also supported.
object in place
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE) obj <- readRDS(obj_path) obj$removeNAs(column = "treatment")
feature_subset()
Feature subset (based on featureData
), automatically applies removeZeros()
.
omics$feature_subset(...)
...
Expressions that return a logical value, and are defined in terms of the variables in featureData
.
Only rows for which all conditions evaluate to TRUE are kept.
object in place
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE) obj <- readRDS(obj_path) obj$feature_subset(Genus == "Streptococcus")
sample_subset()
Sample subset (based on metaData
), automatically applies removeZeros()
.
omics$sample_subset(...)
...
Expressions that return a logical value, and are defined in terms of the variables in metaData
.
Only rows for which all conditions evaluate to TRUE are kept.
object in place
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE) obj <- readRDS(obj_path) obj$sample_subset(treatment == "tumor")
samplepair_subset()
Samplepair subset (based on metaData
), automatically applies removeZeros()
.
omics$samplepair_subset(num_unique_pairs = NULL)
num_unique_pairs
An integer value to define the number of pairs to subset. The default is NULL,
meaning the maximum number of unique pairs will be used to subset the data.
Let's say you have three samples for each pair, then the num_unique_pairs
will be set to 3.
object in place
feature_merge()
Agglomerates features by column, automatically applies removeZeros()
.
omics$feature_merge(feature_rank, feature_filter = NULL)
feature_rank
A character value or vector of columns to aggregate from the featureData
.
feature_filter
A character value or vector of characters to remove features via regex pattern.
object in place
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE) obj <- readRDS(obj_path) obj$feature_merge(feature_rank = c("Kingdom", "Phylum")) obj$feature_merge(feature_rank = "Genus", feature_filter = c("uncultured", "metagenome"))
transform()
Performs transformation on the positive values from the countData
.
omics$transform(FUN)
FUN
A function such as log2
, log
object in place
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE) obj <- readRDS(obj_path) obj$transform(log2)
normalize()
Relative abundance computation by column sums on the countData
.
omics$normalize()
object in place
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE) obj <- readRDS(obj_path) obj$normalize()
rankstat()
Rank statistics based on featureData
omics$rankstat(feature_ranks)
feature_ranks
A vector of characters or integers that match the featureData
.
Counts the number of features identified for each column, for example in case of 16S metagenomics it would be the number of OTUs or ASVs on different taxonomy levels.
A ggplot object.
library(ggplot2) obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE) obj <- readRDS(obj_path) plt <- obj$rankstat(feature_ranks = c("Kingdom", "Phylum", "Family", "Genus", "Species")) plt
alpha_diversity()
Alpha diversity based on diversity
omics$alpha_diversity( col_name, metric = c("shannon", "invsimpson", "simpson"), Brewer.palID = "Set2", evenness = FALSE, paired = FALSE, p.adjust.method = "fdr" )
col_name
A character variable from the metaData
.
metric
An alpha diversity metric as input to diversity.
Brewer.palID
A character name for the palette set to be applied, see brewer.pal or colormap.
evenness
A boolean wether to divide diversity by number of species, see specnumber.
paired
A boolean value to perform paired analysis in wilcox.test and samplepair subsetting via samplepair_subset()
p.adjust.method
A character variable to specify the p.adjust.method to be used, default is 'fdr'.
library(ggplot2) obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE) obj <- readRDS(obj_path) plt <- obj$alpha_diversity(col_name = "treatment", metric = "shannon")
composition()
Creates a table most abundant compositional features. Also assigns a color blind friendly palette for visualizations.
omics$composition( feature_rank, feature_filter = NULL, col_name = NULL, normalize = TRUE, feature_top = c(10, 15), Brewer.palID = "RdYlBu" )
feature_rank
A character variable in featureData
to aggregate via feature_merge()
.
feature_filter
A character or vector of characters to removes features by regex pattern.
col_name
Optional, a character or vector of characters to add to the final compositional data output.
normalize
A boolean value, whether to normalize()
by total sample sums (Default: TRUE).
feature_top
A wholenumber of the top features to visualize, the max is 15, due to a limit of palettes.
Brewer.palID
A character name for the palette set to be applied, see brewer.pal or colormap.
library(ggplot2) obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE) obj <- readRDS(obj_path) result <- obj$composition(feature_rank = "Genus", feature_filter = c("uncultured"), feature_top = 10) plt <- composition_plot(data = result$data, palette = result$palette, feature_rank = "Genus")
ordination()
Ordination of countData
with statistical testing.
omics$ordination( metric = c("bray", "jaccard", "unifrac"), method = c("pcoa", "nmds"), group_by, distmat = NULL, weighted = TRUE, normalize = TRUE, cpus = 1, perm = 999 )
metric
A dissimilarity or similarity metric to be applied on the countData
,
thus far supports 'bray', 'jaccard' and 'unifrac' when a tree is provided via treeData
, see bdiv_distmat.
method
Ordination method, supports "pcoa" and "nmds", see wcmdscale.
group_by
A character variable in metaData
to be used for the pairwise_adonis or pairwise_anosim statistical test.
distmat
A custom distance matrix in either dist or Matrix format.
weighted
A boolean value, whether to compute weighted or unweighted dissimilarities (Default: TRUE).
normalize
A boolean value, whether to normalize()
by total sample sums (Default: TRUE).
cpus
A wholenumber, indicating the number of processes to spawn (Default: 1) in bdiv_distmat.
perm
A wholenumber, number of permutations to compare against the null hypothesis of adonis2 and anosim (default: perm=999
).
library(ggplot2) obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE) obj <- readRDS(obj_path) pcoa_plots <- obj$ordination(metric = "bray", method = "pcoa", group_by = "treatment", weighted = TRUE, normalize = TRUE) pcoa_plots
DFE()
Differential feature expression (DFE) using the foldchange for both paired and non-paired test.
omics$DFE( feature_rank, feature_filter = NULL, paired = FALSE, normalize = TRUE, condition.group, condition_A, condition_B, pvalue.threshold = 0.05, foldchange.threshold = 0.06, abundance.threshold = 0 )
feature_rank
A character or vector of characters in the featureData
to aggregate via feature_merge()
.
feature_filter
A character or vector of characters to remove features via regex pattern (Default: NULL).
paired
A boolean value, the paired is only applicable when a SAMPLEPAIR_ID
column exists within the metaData
. See wilcox.test and samplepair_subset()
.
normalize
A boolean value, whether to normalize()
by total sample sums (Default: TRUE).
condition.group
A character variable of an existing column name in metaData
, wherein the conditions A and B are located.
condition_A
A character value or vector of characters.
condition_B
A character value or vector of characters.
pvalue.threshold
A numeric value used as a p-value threshold to label and color significant features (Default: 0.05).
foldchange.threshold
A numeric value used as a fold-change threshold to label and color significantly expressed features (Default: 0.06).
abundance.threshold
A numeric value used as an abundance threshold to size the scatter dots based on their mean relative abundance (default: 0.01).
library(ggplot2) obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE) obj <- readRDS(obj_path) unpaired <- obj$DFE(feature_rank = "Genus", paired = FALSE, condition.group = "treatment", condition_A = c("healthy"), condition_B = c("tumor"))
autoFlow()
Automated Omics Analysis based on the metaData
, see validate()
.
For now only works with headers that start with prefix CONTRAST_
.
omics$autoFlow( feature_ranks = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"), feature_contrast = c("Phylum", "Family", "Genus"), feature_filter = c("uncultured"), distance_metrics = c("unifrac"), beta_div_table = NULL, alpha_div_table = NULL, normalize = TRUE, weighted = TRUE, pvalue.threshold = 0.05, perm = 999, cpus = 1, filename = paste0(getwd(), "/report.html") )
feature_ranks
A character vector as input to rankstat()
.
feature_contrast
A character vector of feature columns in the featureData
to aggregate via feature_merge()
.
feature_filter
A character vector to filter unwanted features, default: c("uncultured")
distance_metrics
A character vector specifying what (dis)similarity metrics to use, default c("unifrac")
beta_div_table
A path to pre-computed distance matrix, expects tsv/csv/txt file.
alpha_div_table
A path to pre-computed alpha diversity with rarefraction depth, expects tsv/csv/txt from qiime2, see read_rarefraction_qiime.
normalize
A boolean value, whether to normalize()
by total sample sums (Default: TRUE).
weighted
A boolean value, whether to compute weighted or unweighted dissimilarities (Default: TRUE).
pvalue.threshold
A numeric value, the p-value is used to include/exclude composition and foldchanges plots coming from alpha- and beta diversity analysis (Default: 0.05).
perm
A wholenumber, number of permutations to compare against the null hypothesis of adonis2 or anosim (default: perm=999
).
cpus
Number of cores to use, only used in ordination()
when beta_div_table is not supplied.
filename
A character to name the HTML report, it can also be a filepath (e.g. "/path/to/report.html"
). Default: "report.html" in your current work directory.
A report in HTML format
diversity_plot
composition_plot
ordination_plot, plot_pairwise_stats, pairwise_anosim, pairwise_adonis
volcano_plot, foldchange
## ------------------------------------------------
## Method `omics$removeZeros`
## ------------------------------------------------
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE)
obj <- readRDS(obj_path)
obj$removeZeros()
## ------------------------------------------------
## Method `omics$removeNAs`
## ------------------------------------------------
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE)
obj <- readRDS(obj_path)
obj$removeNAs(column = "treatment")
## ------------------------------------------------
## Method `omics$feature_subset`
## ------------------------------------------------
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE)
obj <- readRDS(obj_path)
obj$feature_subset(Genus == "Streptococcus")
## ------------------------------------------------
## Method `omics$sample_subset`
## ------------------------------------------------
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE)
obj <- readRDS(obj_path)
obj$sample_subset(treatment == "tumor")
## ------------------------------------------------
## Method `omics$feature_merge`
## ------------------------------------------------
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE)
obj <- readRDS(obj_path)
obj$feature_merge(feature_rank = c("Kingdom", "Phylum"))
obj$feature_merge(feature_rank = "Genus", feature_filter = c("uncultured", "metagenome"))
## ------------------------------------------------
## Method `omics$transform`
## ------------------------------------------------
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE)
obj <- readRDS(obj_path)
obj$transform(log2)
## ------------------------------------------------
## Method `omics$normalize`
## ------------------------------------------------
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE)
obj <- readRDS(obj_path)
obj$normalize()
## ------------------------------------------------
## Method `omics$rankstat`
## ------------------------------------------------
library(ggplot2)
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE)
obj <- readRDS(obj_path)
plt <- obj$rankstat(feature_ranks = c("Kingdom", "Phylum", "Family", "Genus", "Species"))
plt
## ------------------------------------------------
## Method `omics$alpha_diversity`
## ------------------------------------------------
library(ggplot2)
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE)
obj <- readRDS(obj_path)
plt <- obj$alpha_diversity(col_name = "treatment",
metric = "shannon")
## ------------------------------------------------
## Method `omics$composition`
## ------------------------------------------------
library(ggplot2)
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE)
obj <- readRDS(obj_path)
result <- obj$composition(feature_rank = "Genus",
feature_filter = c("uncultured"),
feature_top = 10)
plt <- composition_plot(data = result$data,
palette = result$palette,
feature_rank = "Genus")
## ------------------------------------------------
## Method `omics$ordination`
## ------------------------------------------------
library(ggplot2)
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE)
obj <- readRDS(obj_path)
pcoa_plots <- obj$ordination(metric = "bray",
method = "pcoa",
group_by = "treatment",
weighted = TRUE,
normalize = TRUE)
pcoa_plots
## ------------------------------------------------
## Method `omics$DFE`
## ------------------------------------------------
library(ggplot2)
obj_path <- system.file("extdata", "mock_taxa.rds", package = "OmicFlow", mustWork = TRUE)
obj <- readRDS(obj_path)
unpaired <- obj$DFE(feature_rank = "Genus",
paired = FALSE,
condition.group = "treatment",
condition_A = c("healthy"),
condition_B = c("tumor"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.