simulate_data | R Documentation |
This function simulates counts from a linear model.
simulate_data(
.data,
.estimate_object,
formula_composition,
formula_variability = NULL,
.sample = NULL,
.cell_group = NULL,
.coefficients = NULL,
variability_multiplier = 5,
number_of_draws = 1,
mcmc_seed = sample(1e+05, 1),
cores = detectCores()
)
.data |
A tibble including a cell_group name column | sample name column | read counts column | factor columns | Pvalue column | a significance column |
.estimate_object |
The result of sccomp_estimate execution. This is used for sampling from real-data properties. |
formula_composition |
A formula. The sample formula used to perform the differential cell_group abundance analysis |
formula_variability |
A formula. The formula describing the model for differential variability, for example ~treatment. In most cases, if differentially variability is of interest, the formula should only include the factor of interest as a large anount of data is needed to define variability depending to each factors. |
.sample |
A column name as symbol. The sample identifier |
.cell_group |
A column name as symbol. The cell_group identifier |
.coefficients |
The column names for coefficients, for example, c(b_0, b_1) |
variability_multiplier |
A real scalar. This can be used for artificially increasing the variability of the simulation for benchmarking purposes. |
number_of_draws |
An integer. How may copies of the data you want to draw from the model joint posterior distribution. |
mcmc_seed |
An integer. Used for Markov-chain Monte Carlo reproducibility. By default a random number is sampled from 1 to 999999. This itself can be controlled by set.seed()#' @param cores Integer, the number of cores to be used for parallel calculations. |
cores |
Integer, the number of cores to be used for parallel calculations. |
A tibble (tbl
) with the following columns:
sample - A character column representing the sample name.
type - A factor column representing the type of the sample.
phenotype - A factor column representing the phenotype in the data.
count - An integer column representing the original cell counts.
cell_group - A character column representing the cell group identifier.
b_0 - A numeric column representing the first coefficient used for simulation.
b_1 - A numeric column representing the second coefficient used for simulation.
generated_proportions - A numeric column representing the generated proportions from the simulation.
generated_counts - An integer column representing the generated cell counts from the simulation.
replicate - An integer column representing the replicate number for each draw from the posterior distribution.
message("Use the following example after having installed install.packages(\"cmdstanr\", repos = c(\"https://stan-dev.r-universe.dev/\", getOption(\"repos\")))")
if (instantiate::stan_cmdstan_exists()) {
data("counts_obj")
library(dplyr)
estimate = sccomp_estimate(
counts_obj,
~ type, ~1, sample, cell_group, count,
cores = 1
)
# Set coefficients for cell_groups. In this case all coefficients are 0 for simplicity.
counts_obj = counts_obj |> mutate(b_0 = 0, b_1 = 0)
# Simulate data
simulate_data(counts_obj, estimate, ~type, ~1, sample, cell_group, c(b_0, b_1))
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.