sccomp_estimate | R Documentation |
The sccomp_estimate
function performs linear modeling on a table of cell counts,
which includes a cell-group identifier, sample identifier, integer count, and factors
(continuous or discrete). The user can define a linear model with an input R formula,
where the first factor is the factor of interest. Alternatively, sccomp
accepts
single-cell data containers (e.g., Seurat, SingleCellExperiment, cell metadata, or
group-size) and derives the count data from cell metadata.
sccomp_estimate(
.data,
formula_composition = ~1,
formula_variability = ~1,
.sample,
.cell_group,
.count = NULL,
cores = detectCores(),
bimodal_mean_variability_association = FALSE,
percent_false_positive = 5,
variational_inference = TRUE,
prior_mean = list(intercept = c(0, 1), coefficients = c(0, 1)),
prior_overdispersion_mean_association = list(intercept = c(5, 2), slope = c(0, 0.6),
standard_deviation = c(10, 20)),
.sample_cell_group_pairs_to_exclude = NULL,
verbose = TRUE,
enable_loo = FALSE,
noise_model = "multi_beta_binomial",
exclude_priors = FALSE,
use_data = TRUE,
mcmc_seed = sample(1e+05, 1),
max_sampling_iterations = 20000,
pass_fit = TRUE,
approximate_posterior_inference = NULL
)
.data |
A tibble including cell_group name column, sample name column, read counts column (optional depending on the input class), and factor columns. |
formula_composition |
A formula describing the model for differential abundance. |
formula_variability |
A formula describing the model for differential variability. |
.sample |
A column name as symbol for the sample identifier. |
.cell_group |
A column name as symbol for the cell_group identifier. |
.count |
A column name as symbol for the cell_group abundance (read count). |
cores |
Number of cores to use for parallel calculations. |
bimodal_mean_variability_association |
Boolean for modeling mean-variability as bimodal. |
percent_false_positive |
Real number between 0 and 100 for outlier identification. |
variational_inference |
Boolean for using variational Bayes for posterior inference. It is faster and convenient. Setting this argument to FALSE runs the full Bayesian (Hamiltonian Monte Carlo) inference, slower but it is the gold standard. |
prior_mean |
List with prior knowledge about mean distribution, they are the intercept and coefficient. |
prior_overdispersion_mean_association |
List with prior knowledge about mean/variability association. |
.sample_cell_group_pairs_to_exclude |
Column name with boolean for sample/cell-group pairs exclusion. |
verbose |
Boolean to print progression. |
enable_loo |
Boolean to enable model comparison using the LOO package. |
noise_model |
Character string for the noise model (e.g., 'multi_beta_binomial'). |
exclude_priors |
Boolean to run a prior-free model. |
use_data |
Boolean to run the model data-free. |
mcmc_seed |
Integer for MCMC reproducibility. |
max_sampling_iterations |
Integer to limit maximum iterations for large datasets. |
pass_fit |
Boolean to include the Stan fit as attribute in the output. |
approximate_posterior_inference |
DEPRECATED please use the |
A nested tibble tbl
, with the following columns
cell_group - column including the cell groups being tested
parameter - The parameter being estimated, from the design matrix dscribed with the input formula_composition and formula_variability
factor - The factor in the formula corresponding to the covariate, if exists (e.g. it does not exist in case og Intercept or contrasts, which usually are combination of parameters)
c_lower - lower (2.5%) quantile of the posterior distribution for a composition (c) parameter.
c_effect - mean of the posterior distribution for a composition (c) parameter.
c_upper - upper (97.5%) quantile of the posterior distribution fo a composition (c) parameter.
c_pH0 - Probability of the null hypothesis (no difference) for a composition (c). This is not a p-value.
c_FDR - False-discovery rate of the null hypothesis (no difference) for a composition (c).
c_n_eff - Effective sample size - the number of independent draws in the sample, the higher the better (mc-stan.org/docs/2_25/cmdstan-guide/stansummary.html).
c_R_k_hat - R statistic, a measure of chain equilibrium, should be within 0.05 of 1.0 (mc-stan.org/docs/2_25/cmdstan-guide/stansummary.html).
v_lower - Lower (2.5%) quantile of the posterior distribution for a variability (v) parameter
v_effect - Mean of the posterior distribution for a variability (v) parameter
v_upper - Upper (97.5%) quantile of the posterior distribution for a variability (v) parameter
v_pH0 - Probability of the null hypothesis (no difference) for a variability (v). This is not a p-value.
v_FDR - False-discovery rate of the null hypothesis (no difference), for a variability (v).
v_n_eff - Effective sample size for a variability (v) parameter - the number of independent draws in the sample, the higher the better (mc-stan.org/docs/2_25/cmdstan-guide/stansummary.html).
v_R_k_hat - R statistic for a variability (v) parameter, a measure of chain equilibrium, should be within 0.05 of 1.0 (mc-stan.org/docs/2_25/cmdstan-guide/stansummary.html).
count_data Nested input count data.
data("counts_obj")
estimate =
sccomp_estimate(
counts_obj ,
~ type,
~1,
sample,
cell_group,
count,
cores = 1
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.