sccomp_remove_outliers | R Documentation |
The sccomp_remove_outliers
function takes as input a table of cell counts with columns for cell-group identifier, sample identifier, integer count, and factors (continuous or discrete). The user can define a linear model using an input R formula, where the first factor is the factor of interest. Alternatively, sccomp
accepts single-cell data containers (e.g., Seurat, SingleCellExperiment, cell metadata, or group-size) and derives the count data from cell metadata.
sccomp_remove_outliers(
.estimate,
percent_false_positive = 5,
cores = detectCores(),
inference_method = "pathfinder",
output_directory = "sccomp_draws_files",
verbose = TRUE,
mcmc_seed = sample(1e+05, 1),
max_sampling_iterations = 20000,
enable_loo = FALSE,
approximate_posterior_inference = NULL,
variational_inference = NULL,
...
)
.estimate |
A tibble including a cell_group name column, sample name column, read counts column (optional depending on the input class), and factor columns. |
percent_false_positive |
A real number between 0 and 100 (not inclusive), used to identify outliers with a specific false positive rate. |
cores |
Integer, the number of cores to be used for parallel calculations. |
inference_method |
Character string specifying the inference method to use ('pathfinder', 'hmc', or 'variational'). |
output_directory |
A character string specifying the output directory for Stan draws. |
verbose |
Logical, whether to print progression details. |
mcmc_seed |
Integer, used for Markov-chain Monte Carlo reproducibility. By default, a random number is sampled from 1 to 999999. |
max_sampling_iterations |
Integer, limits the maximum number of iterations in case a large dataset is used, to limit computation time. |
enable_loo |
Logical, whether to enable model comparison using the R package LOO. This is useful for comparing fits between models, similar to ANOVA. |
approximate_posterior_inference |
DEPRECATED, use the |
variational_inference |
Logical, whether to use variational Bayes for posterior inference. It is faster and convenient. Setting this argument to |
... |
Additional arguments passed to the |
A tibble (tbl
), with the following columns:
cell_group - The cell groups being tested.
parameter - The parameter being estimated from the design matrix described by the input formula_composition and formula_variability.
factor - The covariate factor in the formula, if applicable (e.g., not present for Intercept or contrasts).
c_lower - Lower (2.5%) quantile of the posterior distribution for a composition (c) parameter.
c_effect - Mean of the posterior distribution for a composition (c) parameter.
c_upper - Upper (97.5%) quantile of the posterior distribution for a composition (c) parameter.
c_n_eff - Effective sample size, the number of independent draws in the sample. The higher, the better.
c_R_k_hat - R statistic, a measure of chain equilibrium, should be within 0.05 of 1.0.
v_lower - Lower (2.5%) quantile of the posterior distribution for a variability (v) parameter.
v_effect - Mean of the posterior distribution for a variability (v) parameter.
v_upper - Upper (97.5%) quantile of the posterior distribution for a variability (v) parameter.
v_n_eff - Effective sample size for a variability (v) parameter.
v_R_k_hat - R statistic for a variability (v) parameter, a measure of chain equilibrium.
count_data - Nested input count data.
message("Use the following example after having installed install.packages(\"cmdstanr\", repos = c(\"https://stan-dev.r-universe.dev/\", getOption(\"repos\")))")
if (instantiate::stan_cmdstan_exists()) {
data("counts_obj")
estimate = sccomp_estimate(
counts_obj,
~ type,
~1,
sample,
cell_group,
count,
cores = 1
) |>
sccomp_remove_outliers(cores = 1)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.