muscat_simulation: Simulate Datasets by muscat

View source: R/23-muscat.R

muscat_simulationR Documentation

Simulate Datasets by muscat

Description

This function is used to simulate datasets from learned parameters by simData function in muscat package.

Usage

muscat_simulation(
  parameters,
  other_prior = NULL,
  return_format,
  verbose = FALSE,
  seed
)

Arguments

parameters

A object generated by muscat::prepSim()

other_prior

A list with names of certain parameters. Some methods need extra parameters to execute the estimation step, so you must input them. In simulation step, the number of cells, genes, groups, batches, the percent of DEGs are usually customed, so before simulating a dataset you must point it out. See Details below for more information.

return_format

A character. Alternative choices: list, SingleCellExperiment, Seurat, h5ad. If you select h5ad, you will get a path where the .h5ad file saves to.

verbose

Logical. Whether to return messages or not.

seed

A random seed.

Details

In addtion to simulate datasets with default parameters, users want to simulate other kinds of datasets, e.g. a counts matrix with 2 or more cell groups. In muscat, you can set extra parameters to simulate datasets.

The customed parameters you can set are below:

  1. nCells. In muscat, you can set nCells directly. For example, if you want to simulate 1000 cells, you can type other_prior = list(nCells = 1000).

  2. nGenes. You can directly set other_prior = list(nGenes = 5000) to simulate 5000 genes.

  3. nGroups. In muscat, nGroups can be 1 or 2 because muscat can only simulate two cell groups.

  4. de.prob. You can directly set other_prior = list(de.prob = 0.2) to simulate DEGs that account for 20 percent of all genes.

  5. fc.group. You can directly set other_prior = list(fc.group = 2) to specify the minimum fold change of DEGs.

For more customed parameters in muscat, please check muscat::simData().

References

Crowell H L, Soneson C, Germain P L, et al. Muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nature communications, 2020, 11(1): 1-12. https://doi.org/10.1038/s41467-020-19894-4

Github URL: https://github.com/HelenaLC/muscat

Examples

## Not run: 
ref_data <- simmethods::data
## cell groups
group_condition <- as.numeric(simmethods::group_condition)
## estimation
estimate_result <- simmethods::muscat_estimation(
  ref_data = ref_data,
  other_prior = list(group.condition = group_condition),
  verbose = TRUE,
  seed = 111
)

# 1) Simulate with default parameters
simulate_result <- simmethods::muscat_simulation(
  parameters = estimate_result[["estimate_result"]],
  other_prior = NULL,
  return_format = "list",
  verbose = TRUE,
  seed = 111
)
## counts
counts <- simulate_result[["simulate_result"]][["count_data"]]
dim(counts)


# 2) Simulate 1000 cells and 2000 genes
simulate_result <- simmethods::muscat_simulation(
  parameters = estimate_result[["estimate_result"]],
  other_prior = list(nCells = 1000,
                     nGenes = 2000),
  return_format = "list",
  verbose = TRUE,
  seed = 111
)

## counts
counts <- simulate_result[["simulate_result"]][["count_data"]]
dim(counts)


# 3) Simulate 2 groups (20% proportion of DEGs, 4 fold change)
simulate_result <- simmethods::muscat_simulation(
  parameters = estimate_result[["estimate_result"]],
  other_prior = list(nCells = 1000,
                     nGenes = 2000,
                     nGroups = 2,
                     de.prob = 0.2,
                     fc.group = 4),
  return_format = "list",
  verbose = TRUE,
  seed = 111
)
## cell information
col_data <- simulate_result[["simulate_result"]][["col_meta"]]
table(col_data$group)/1000
## gene information
row_data <- simulate_result[["simulate_result"]][["row_meta"]]
table(row_data$de_gene)[2]/2000

## End(Not run)


duohongrui/simmethods documentation built on June 17, 2024, 10:49 a.m.