sccomp_estimate: Main Function for SCCOMP Estimate

View source: R/methods.R

sccomp_estimateR Documentation

Main Function for SCCOMP Estimate

Description

The sccomp_estimate function performs linear modeling on a table of cell counts, which includes a cell-group identifier, sample identifier, integer count, and factors (continuous or discrete). The user can define a linear model with an input R formula, where the first factor is the factor of interest. Alternatively, sccomp accepts single-cell data containers (e.g., Seurat, SingleCellExperiment, cell metadata, or group-size) and derives the count data from cell metadata.

Usage

sccomp_estimate(
  .data,
  formula_composition = ~1,
  formula_variability = ~1,
  .sample,
  .cell_group,
  .count = NULL,
  cores = detectCores(),
  bimodal_mean_variability_association = FALSE,
  percent_false_positive = 5,
  variational_inference = TRUE,
  prior_mean = list(intercept = c(0, 1), coefficients = c(0, 1)),
  prior_overdispersion_mean_association = list(intercept = c(5, 2), slope = c(0, 0.6),
    standard_deviation = c(10, 20)),
  .sample_cell_group_pairs_to_exclude = NULL,
  verbose = TRUE,
  enable_loo = FALSE,
  noise_model = "multi_beta_binomial",
  exclude_priors = FALSE,
  use_data = TRUE,
  mcmc_seed = sample(1e+05, 1),
  max_sampling_iterations = 20000,
  pass_fit = TRUE,
  approximate_posterior_inference = NULL
)

Arguments

.data

A tibble including cell_group name column, sample name column, read counts column (optional depending on the input class), and factor columns.

formula_composition

A formula describing the model for differential abundance.

formula_variability

A formula describing the model for differential variability.

.sample

A column name as symbol for the sample identifier.

.cell_group

A column name as symbol for the cell_group identifier.

.count

A column name as symbol for the cell_group abundance (read count).

cores

Number of cores to use for parallel calculations.

bimodal_mean_variability_association

Boolean for modeling mean-variability as bimodal.

percent_false_positive

Real number between 0 and 100 for outlier identification.

variational_inference

Boolean for using variational Bayes for posterior inference. It is faster and convenient. Setting this argument to FALSE runs the full Bayesian (Hamiltonian Monte Carlo) inference, slower but it is the gold standard.

prior_mean

List with prior knowledge about mean distribution, they are the intercept and coefficient.

prior_overdispersion_mean_association

List with prior knowledge about mean/variability association.

.sample_cell_group_pairs_to_exclude

Column name with boolean for sample/cell-group pairs exclusion.

verbose

Boolean to print progression.

enable_loo

Boolean to enable model comparison using the LOO package.

noise_model

Character string for the noise model (e.g., 'multi_beta_binomial').

exclude_priors

Boolean to run a prior-free model.

use_data

Boolean to run the model data-free.

mcmc_seed

Integer for MCMC reproducibility.

max_sampling_iterations

Integer to limit maximum iterations for large datasets.

pass_fit

Boolean to include the Stan fit as attribute in the output.

approximate_posterior_inference

DEPRECATED please use the variational_inference argument.

Value

A nested tibble tbl, with the following columns

  • cell_group - column including the cell groups being tested

  • parameter - The parameter being estimated, from the design matrix dscribed with the input formula_composition and formula_variability

  • factor - The factor in the formula corresponding to the covariate, if exists (e.g. it does not exist in case og Intercept or contrasts, which usually are combination of parameters)

  • c_lower - lower (2.5%) quantile of the posterior distribution for a composition (c) parameter.

  • c_effect - mean of the posterior distribution for a composition (c) parameter.

  • c_upper - upper (97.5%) quantile of the posterior distribution fo a composition (c) parameter.

  • c_pH0 - Probability of the null hypothesis (no difference) for a composition (c). This is not a p-value.

  • c_FDR - False-discovery rate of the null hypothesis (no difference) for a composition (c).

  • c_n_eff - Effective sample size - the number of independent draws in the sample, the higher the better (mc-stan.org/docs/2_25/cmdstan-guide/stansummary.html).

  • c_R_k_hat - R statistic, a measure of chain equilibrium, should be within 0.05 of 1.0 (mc-stan.org/docs/2_25/cmdstan-guide/stansummary.html).

  • v_lower - Lower (2.5%) quantile of the posterior distribution for a variability (v) parameter

  • v_effect - Mean of the posterior distribution for a variability (v) parameter

  • v_upper - Upper (97.5%) quantile of the posterior distribution for a variability (v) parameter

  • v_pH0 - Probability of the null hypothesis (no difference) for a variability (v). This is not a p-value.

  • v_FDR - False-discovery rate of the null hypothesis (no difference), for a variability (v).

  • v_n_eff - Effective sample size for a variability (v) parameter - the number of independent draws in the sample, the higher the better (mc-stan.org/docs/2_25/cmdstan-guide/stansummary.html).

  • v_R_k_hat - R statistic for a variability (v) parameter, a measure of chain equilibrium, should be within 0.05 of 1.0 (mc-stan.org/docs/2_25/cmdstan-guide/stansummary.html).

  • count_data Nested input count data.

Examples


data("counts_obj")

estimate =
  sccomp_estimate(
  counts_obj ,
   ~ type,
   ~1,
   sample,
   cell_group,
   count,
    cores = 1
  )


stemangiola/sccomp documentation built on May 17, 2024, 6:24 a.m.