Home

/

GitHub

/

FrederickHuangLin/ANCOMBC

/

secom_linear: Sparse estimation of linear correlations among microbiomes

secom_linear: Sparse estimation of linear correlations among microbiomes
In FrederickHuangLin/ANCOMBC: Microbiome differential abudance and correlation analyses with bias correction

View source: R/secom_linear.R

secom_linear

R Documentation

Sparse estimation of linear correlations among microbiomes

Description

Obtain the sparse correlation matrix for linear correlations between taxa. The current version of secom_linear function supports either of the three correlation coefficients: Pearson, Spearman, and Kendall's \tau.

Usage

secom_linear(
  data,
  taxa_are_rows = TRUE,
  assay.type = assay_name,
  assay_name = "counts",
  rank = tax_level,
  tax_level = NULL,
  aggregate_data = NULL,
  meta_data = NULL,
  pseudo = 0,
  prv_cut = 0.5,
  lib_cut = 1000,
  corr_cut = 0.5,
  wins_quant = c(0.05, 0.95),
  method = c("pearson", "spearman"),
  soft = FALSE,
  alpha_grid = 0,
  thresh_len = 100,
  n_cv = 10,
  thresh_hard = 0,
  max_p = 0.005,
  n_cl = 1,
  verbose = TRUE
)

Arguments

`data`	a `list` of the input data. The `data` parameter should be either a `matrix`, `data.frame`, `phyloseq` or a `TreeSummarizedExperiment` object. Both `phyloseq` and `TreeSummarizedExperiment` objects consist of a feature table (microbial count table), a sample metadata table, a taxonomy table (optional), and a phylogenetic tree (optional). If a `matrix` or `data.frame` is provided, ensure that the row names of the `metadata` match the sample names (column names if `taxa_are_rows` is TRUE, and row names otherwise) in `data`. if a `phyloseq` or a `TreeSummarizedExperiment` is used, this standard has already been enforced. For detailed information, refer to `?phyloseq::phyloseq` or `?TreeSummarizedExperiment::TreeSummarizedExperiment`. It is recommended to use low taxonomic levels, such as OTU or species level, as the estimation of sampling fractions requires a large number of taxa. If working with multiple ecosystems, such as gut and tongue, stack the data by specifying the list of input data as `data = list(gut = pseq1, tongue = pseq2)`.
`taxa_are_rows`	logical. Whether taxa are positioned in the rows of the feature table. Default is TRUE.
`assay.type`	alias for `assay_name`.
`assay_name`	character. Name of the feature table within the data object (only applicable if the data object is a `(Tree)SummarizedExperiment`). Default is "counts". See `?SummarizedExperiment::assay` for more details.
`rank`	alias for `tax_level`.
`tax_level`	character. The taxonomic level of interest. The input data can be agglomerated at different taxonomic levels based on your research interest. Default is NULL, i.e., do not perform agglomeration, and the SECOM anlysis will be performed at the lowest taxonomic level of the input `data`.
`aggregate_data`	The abundance data that has been aggregated to the desired taxonomic level. This parameter is required only when the input data is in `matrix` or `data.frame` format. For `phyloseq` or `TreeSummarizedExperiment` data, aggregation is performed by specifying the `tax_level` parameter.
`meta_data`	a `data.frame` containing sample metadata. This parameter is mandatory when the input `data` is a generic `matrix` or `data.frame`. Ensure that the row names of the `metadata` match the sample names (column names if `taxa_are_rows` is TRUE, and row names otherwise) in `data`.
`pseudo`	numeric. Add pseudo-counts to the data. Default is 0 (no pseudo-counts).
`prv_cut`	a numerical fraction between 0 and 1. Taxa with prevalences (the proportion of samples in which the taxon is present) less than `prv_cut` will be excluded in the analysis. For example, if there are 100 samples, and a taxon has nonzero counts present in less than 100*prv_cut samples, it will not be considered in the analysis. Default is 0.50.
`lib_cut`	a numerical threshold for filtering samples based on library sizes. Samples with library sizes less than `lib_cut` will be excluded in the analysis. Default is 1000.
`corr_cut`	numeric. To avoid false positives caused by taxa with small variances, taxa with Pearson correlation coefficients greater than `corr_cut` with the estimated sample-specific bias will be flagged. When taxa are flagged, the pairwise correlation coefficient between them will be set to 0s. Default is 0.5.
`wins_quant`	a numeric vector of probabilities with values between 0 and 1. Replace extreme values in the abundance data with less extreme values. Default is `c(0.05, 0.95)`. For details, see `?DescTools::Winsorize`.
`method`	character. It indicates which correlation coefficient is to be computed. It can be either "pearson" or "spearman".
`soft`	logical. `TRUE` indicates that soft thresholding is applied to achieve the sparsity of the correlation matrix. `FALSE` indicates that hard thresholding is applied to achieve the sparsity of the correlation matrix. Default is `FALSE`.
`alpha_grid`	a numeric vector of penalty parameters for the element-wise L1 norm to induce sparsity. Default is 0.
`thresh_len`	numeric. Grid-search is implemented to find the optimal values over `thresh_len` thresholds for the thresholding operator. Default is 100.
`n_cv`	numeric. The fold number in cross validation. Default is 10 (10-fold cross validation).
`thresh_hard`	Numeric. Pairwise correlation coefficients (in their absolute value) that are less than or equal to `thresh_hard` will be set to 0. Default is 0.3.
`max_p`	numeric. Obtain the sparse correlation matrix by p-value filtering. Pairwise correlation coefficients with p-value greater than `max_p` will be set to 0s. Default is 0.005.
`n_cl`	numeric. The number of nodes to be forked. For details, see `?parallel::makeCluster`. Default is 1 (no parallel computing).
`verbose`	logical. Whether to display detailed progress messages.

Value

a list with components:

s_diff_hat, a numeric vector of estimated sample-specific biases.
y_hat, a matrix of bias-corrected abundances
cv_error, a numeric vector of cross-validation error estimates, which are the Frobenius norm differences between correlation matrices using training set and validation set, respectively.
thresh_grid, a numeric vector of thresholds in the cross-validation.
thresh_opt, numeric. The optimal threshold through cross-validation.
mat_cooccur, a matrix of taxon-taxon co-occurrence pattern. The number in each cell represents the number of complete (nonzero) samples for the corresponding pair of taxa.
corr, the sample correlation matrix (using the measure specified in method) computed using the bias-corrected abundances y_hat.
corr_p, the p-value matrix corresponding to the sample correlation matrix corr.
corr_th, the sparse correlation matrix obtained by thresholding based on the method specified in soft.
corr_fl, the sparse correlation matrix obtained by p-value filtering based on the cutoff specified in max_p.
corr_reg, the correlation matrix obtained by winsorizing small eigenvalues.

Author(s)

Huang Lin

Examples

library(ANCOMBC)
if (requireNamespace("microbiome", quietly = TRUE)) {
    data(atlas1006, package = "microbiome")
    # subset to baseline
    pseq = phyloseq::subset_samples(atlas1006, time == 0)

    # run secom_linear function
    set.seed(123)
    res_linear = secom_linear(data = list(pseq), taxa_are_rows = TRUE,
                              tax_level = "Phylum",
                              aggregate_data = NULL, meta_data = NULL, pseudo = 0,
                              prv_cut = 0.5, lib_cut = 1000, corr_cut = 0.5,
                              wins_quant = c(0.05, 0.95), method = "pearson",
                              soft = FALSE, alpha_grid = 0,
                              thresh_len = 20, n_cv = 10,
                              thresh_hard = 0.3, max_p = 0.005, n_cl = 2)

    corr_th = res_linear$corr_th
    corr_fl = res_linear$corr_fl
} else {
    message("The 'microbiome' package is not installed. Please install it to use this example.")
}

FrederickHuangLin/ANCOMBC documentation built on June 11, 2025, 6:22 p.m.

FrederickHuangLin/ANCOMBC index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

FrederickHuangLin/ANCOMBC
Microbiome differential abudance and correlation analyses with bias correction

secom_linear: Sparse estimation of linear correlations among microbiomes
In FrederickHuangLin/ANCOMBC: Microbiome differential abudance and correlation analyses with bias correction

Sparse estimation of linear correlations among microbiomes

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to secom_linear in FrederickHuangLin/ANCOMBC...

R Package Documentation

Browse R Packages

We want your feedback!

FrederickHuangLin/ANCOMBC Microbiome differential abudance and correlation analyses with bias correction

secom_linear: Sparse estimation of linear correlations among microbiomes In FrederickHuangLin/ANCOMBC: Microbiome differential abudance and correlation analyses with bias correction

Sparse estimation of linear correlations among microbiomes

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to secom_linear in FrederickHuangLin/ANCOMBC...

R Package Documentation

Browse R Packages

We want your feedback!

FrederickHuangLin/ANCOMBC
Microbiome differential abudance and correlation analyses with bias correction

secom_linear: Sparse estimation of linear correlations among microbiomes
In FrederickHuangLin/ANCOMBC: Microbiome differential abudance and correlation analyses with bias correction