continuous_discover: Unsupervised meta-analytical discovery and validation of...
In biobakery/MMUPHin: Meta-analysis Methods with Uniform Pipeline for Heterogeneity in Microbiome Studies

continuous_discover

R Documentation

Unsupervised meta-analytical discovery and validation of continuous structures in microbial abundance data

Description

continuous_discover takes as input a feature-by-sample matrix of microbial abundances. It first performs unsupervised continuous structure discovery (PCA) within each batch. Loadings of top PCs from each batch are then mapped against each other to identify "consensus" loadings that are reproducible across batches with a network community discovery approach with igraph. The identified consensus loadings/scores can be viewed as continuous structures in microbial profiles that are recurrent across batches and valid in a meta-analyitical sense. continuous_discover returns, among other output, the identified consensus scores for continuous structures in the provided microbial abundance profiles, as well as the consensus PC loadings which can be used to assign continuous scores to any sample with the same set of microbial features.

Usage

continuous_discover(feature_abd, batch, data, control)

Arguments

`feature_abd`	feature-by-sample matrix of abundances (proportions or counts).
`batch`	name of the batch variable. This variable in data should be a factor variable and will be converted to so with a warning if otherwise.
`data`	data frame of metadata, columns must include batch.
`control`	a named list of additional control parameters. See details.

Details

control should be provided as a named list of the following components (can be a subset).

normalization: character. Similar to the normalization parameter in Maaslin2 but only "TSS" and "NONE" are allowed. Default to "TSS" (total sum scaling).
transform: character. Similar to the transform parameter in Maaslin2 but only "AST" and "LOG" are allowed. Default to "AST" (arcsine square root transformation).
pseudo_count: numeric. Pseudo count to add feature_abd before the transformation. Default to NULL, in which case pseudo count will be set automatically to 0 if transform="AST", and half of minimal non-zero values in feature_abd if transform="LOG".
var_perc_cutoff: numeric. A value between 0 and 1 that indicates the percentage variability explained to cut off at for selecting top PCs in each batch. Across batches, the top PCs that in total explain more than var_perc_cutoff of the total variability will be selected for meta-analytical continuous structure discovery. Default to 0.8 (PCs included need to explain at least 80 total variability).
cos_cutoff: numeric. A value between 0 and 1 that indicates cutoff for absolute cosine coefficients between PC loadings to construct the method's network with. Once the top PC loadings from each batch are selected, cosine coefficients between each loading pair are calculated which indicate their similarity. Loading pairs with absolute cosine coefficients surpassing cos_cutoff are then considered as associated with each other, and represented as an edge between the pair in a PC loading network. Network community discovery can then be performed on this network to identified densely connected "clusters" of PC loadings, which represent meta-analytically recurrent continuous structures.
cluster_function: function. cluster_function is used to perform community structure discovery in the constructed PC loading network. This can be any of the network cluster functions provided in igraph. Default to cluster_optimal. Note that this option can be slow for larger datasets, in which case cluster_fast_greedy is recommended.
network_plot: character. Name for the generated network figure file. Default to "clustered_network.pdf". Can be set to NULL in which case no output will be generated.
plot_size_cutoff: integer. Clusters with sizes smaller than or equal to plot_size_cutoff will be excluded in the visualized network. Defaul to 2 - visualized clusters must have at least three nodes (PC loadings).
diagnostic_plot: character. Name for the generated diagnostic figure file. Default to "continuous_diagnostic.pdf". Can be set to NULL in which case no output will be generated.
verbose: logical. Indicates whether or not verbose information will be printed.

Value

a list, with the following components:

consensus_scores: matrix of identified consensus continuous scores. Columns are the identified consensus scores and rows correspond to samples in feature_abd.
consensus_loadings: matrix of identified consensus loadings. Columns are the identified consensus scores and rows correspond to features in feature_abd.
mat_vali: matrix of validation cosine coefficients of the identified consensus loadings. Columns correspond to the identified consensus scores and rows correspond to batches.
network, communities, mat_cos: components for the constructed PC loading network and community discovery results. network is a igraph graph object for the constructed network of associated PC loadings. communities is a communities object for the identified consensus loading clusters in network (output from control$cluster_function). mat_cos is the matrix of cosine coefficients between all selected top PCs from all batches.
control: list of additional control parameters used in the function call.

Author(s)

Siyuan Ma, siyuanma@g.harvard.edu

Examples

data("CRC_abd", "CRC_meta")
fit_continuous <- continuous_discover(feature_abd = CRC_abd,
                                      batch = "studyID",
                                      data = CRC_meta)

biobakery/MMUPHin documentation built on March 30, 2024, 4:50 a.m.

biobakery/MMUPHin index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

biobakery/MMUPHin
Meta-analysis Methods with Uniform Pipeline for Heterogeneity in Microbiome Studies

continuous_discover: Unsupervised meta-analytical discovery and validation of...
In biobakery/MMUPHin: Meta-analysis Methods with Uniform Pipeline for Heterogeneity in Microbiome Studies

Unsupervised meta-analytical discovery and validation of continuous structures in microbial abundance data

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to continuous_discover in biobakery/MMUPHin...

R Package Documentation

Browse R Packages

We want your feedback!

biobakery/MMUPHin Meta-analysis Methods with Uniform Pipeline for Heterogeneity in Microbiome Studies

continuous_discover: Unsupervised meta-analytical discovery and validation of... In biobakery/MMUPHin: Meta-analysis Methods with Uniform Pipeline for Heterogeneity in Microbiome Studies

Unsupervised meta-analytical discovery and validation of continuous structures in microbial abundance data

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to continuous_discover in biobakery/MMUPHin...

R Package Documentation

Browse R Packages

We want your feedback!

biobakery/MMUPHin
Meta-analysis Methods with Uniform Pipeline for Heterogeneity in Microbiome Studies

continuous_discover: Unsupervised meta-analytical discovery and validation of...
In biobakery/MMUPHin: Meta-analysis Methods with Uniform Pipeline for Heterogeneity in Microbiome Studies