secom_linear | R Documentation |
Obtain the sparse correlation matrix for linear correlations
between taxa. The current version of secom_linear
function supports
either of the three correlation coefficients: Pearson, Spearman, and
Kendall's τ.
secom_linear( data, assay_name = "counts", tax_level = NULL, pseudo = 0, prv_cut = 0.5, lib_cut = 1000, corr_cut = 0.5, wins_quant = c(0.05, 0.95), method = c("pearson", "kendall", "spearman"), soft = FALSE, thresh_len = 100, n_cv = 10, thresh_hard = 0, max_p = 0.005, n_cl = 1 )
data |
a list of the input data. Each element of the list can be a
|
assay_name |
character. Name of the count table in the data object
(only applicable if data object is a |
tax_level |
character. The taxonomic level of interest. The input data
can be agglomerated at different taxonomic levels based on your research
interest. Default is NULL, i.e., do not perform agglomeration, and the
SECOM anlysis will be performed at the lowest taxonomic level of the
input |
pseudo |
numeric. Add pseudo-counts to the data. Default is 0 (no pseudo-counts). |
prv_cut |
a numerical fraction between 0 and 1. Taxa with prevalences
less than |
lib_cut |
a numerical threshold for filtering samples based on library
sizes. Samples with library sizes less than |
corr_cut |
numeric. To prevent false positives due to taxa with
small variances, taxa with Pearson correlation coefficients greater than
|
wins_quant |
a numeric vector of probabilities with values between
0 and 1. Replace extreme values in the abundance data with less
extreme values. Default is |
method |
character. It indicates which correlation coefficient is to be computed. One of "pearson", "kendall", or "spearman": can be abbreviated. |
soft |
logical. |
thresh_len |
numeric. Grid-search is implemented to find the optimal
values over |
n_cv |
numeric. The fold number in cross validation. Default is 10 (10-fold cross validation). |
thresh_hard |
Numeric. Set a hard threshold for the correlation matrix.
Pairwise correlation coefficient (in its absolute value) less than or equal
to |
max_p |
numeric. Obtain the sparse correlation matrix by
p-value filtering. Pairwise correlation coefficient with p-value greater than
|
n_cl |
numeric. The number of nodes to be forked. For details, see
|
a list
with components:
s_diff_hat
, a numeric vector of estimated
sample-specific biases.
y_hat
, a matrix of bias-corrected abundances
cv_error
, a numeric vector of cross-validation error
estimates, which are the Frobenius norm differences between
correlation matrices using training set and validation set,
respectively.
thresh_grid
, a numeric vector of thresholds
in the cross-validation.
thresh_opt
, numeric. The optimal threshold through
cross-validation.
mat_cooccur
, a matrix of taxon-taxon co-occurrence
pattern. The number in each cell represents the number of complete
(nonzero) samples for the corresponding pair of taxa.
corr
, the sample correlation matrix (using the measure
specified in method
) computed using the bias-corrected
abundances y_hat
.
corr_p
, the p-value matrix corresponding to the sample
correlation matrix corr
.
corr_th
, the sparse correlation matrix obtained by
thresholding based on the method specified in soft
.
corr_fl
, the sparse correlation matrix obtained by
p-value filtering based on the cutoff specified in max_p
.
Huang Lin
secom_dist
library(ANCOMBC) data(dietswap) # subset to baseline tse = dietswap[, dietswap$timepoint == 1] set.seed(123) res_linear = secom_linear(data = list(tse), assay_name = "counts", tax_level = "Phylum", pseudo = 0, prv_cut = 0.5, lib_cut = 1000, corr_cut = 0.5, wins_quant = c(0.05, 0.95), method = "pearson", soft = FALSE, thresh_len = 20, n_cv = 10, thresh_hard = 0.3, max_p = 0.005, n_cl = 2) corr_th = res_linear$corr_th corr_fl = res_linear$corr_fl
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.