SPARSim_simulation: Simulate Datasets by SPARSim

View source: R/19-SPARSim.R

SPARSim_simulationR Documentation

Simulate Datasets by SPARSim

Description

This function is used to simulate datasets from learned parameters by SPARSim_simulation function in SPARSim package.

Usage

SPARSim_simulation(
  parameters,
  other_prior = NULL,
  return_format,
  verbose = FALSE,
  seed
)

Arguments

parameters

A object generated by SPARSim::SPARSim_simulation()

other_prior

A list with names of certain parameters. Some methods need extra parameters to execute the estimation step, so you must input them. In simulation step, the number of cells, genes, groups, batches, the percent of DEGs are usually customed, so before simulating a dataset you must point it out. See Details below for more information.

return_format

A character. Alternative choices: list, SingleCellExperiment, Seurat, h5ad. If you select h5ad, you will get a path where the .h5ad file saves to.

verbose

Logical. Whether to return messages or not.

seed

A random seed.

Details

In addtion to simulate datasets with default parameters, users want to simulate other kinds of datasets, e.g. a counts matrix with 2 or more cell groups. In SPARSim, you can set extra parameters to simulate datasets.

The customed parameters you can set are below:

  1. de.prob. You can directly set other_prior = list(de.prob = 0.2) to simulate DEGs that account for 20 percent of all genes.

  2. fc.group. You can directly set other_prior = list(fc.group = 3) to specify the fold change of DEGs.

  3. batch.condition. You can input the batch vector that each cell belongs to and set other_prior = list(batch.condition = xxxx). This parameter also determine the number of batches.

If users want to simulate groups, they should estimate group parameters by inputting group.condition parameter previously. Otherwise, thay can not simulate groups.

For more customed parameters in SPARSim, please check SPARSim::SPARSim_simulation().

References

Baruzzo G, Patuzzi I, Di Camillo B. SPARSim single cell: a count data simulator for scRNA-seq data. Bioinformatics, 2020, 36(5): 1468-1475. https://doi.org/10.1093/bioinformatics/btz752

Gitlab URL: https://gitlab.com/sysbiobig/sparsim

Examples

## Not run: 
ref_data <- simmethods::data
## Estimation
group_condition <- as.numeric(simmethods::group_condition)
estimate_result <- simmethods::SPARSim_estimation(
  ref_data = ref_data,
  other_prior = list(group.condition = group_condition),
  verbose = TRUE,
  seed = 111
)
## 1) Simulation (20% proportion of DEGs, fold change 3)
simulate_result <- simmethods::SPARSim_simulation(
  parameters = estimate_result[["estimate_result"]],
  other_prior = list(de.prob = 0.2,
                     fc.group = 3),
  return_format = "list",
  verbose = TRUE,
  seed = 111
)
## counts
counts <- simulate_result[["simulate_result"]][["count_data"]]
dim(counts)
## cell information
col_data <- simulate_result[["simulate_result"]][["col_meta"]]
table(col_data$group)
## gene information
row_data <- simulate_result[["simulate_result"]][["row_meta"]]
table(row_data$de_gene)[2]/4000

## 2) In SPARSim, users can simulate batches when batch.condition parameter is available
simulate_result <- simmethods::SPARSim_simulation(
  parameters = estimate_result[["estimate_result"]],
  other_prior = list(de.prob = 0.2,
                     fc.group = 3,
                     batch.condition = sample(1:3, 160, replace = TRUE)),
  return_format = "list",
  verbose = TRUE,
  seed = 111
)
## counts
counts <- simulate_result[["simulate_result"]][["count_data"]]
dim(counts)
## cell information
col_data <- simulate_result[["simulate_result"]][["col_meta"]]
table(col_data$group)
table(col_data$batch)

## 3) Users can also utilize spike-in genes to simulate datasets. In this case, users
## must input dilution.factor and volume (nanoliter) parameters. Note that the
## reference matrix must contain spike-in gene counts.
ref_data <- simmethods::data

group_condition <- as.numeric(simmethods::group_condition)
estimate_result <- simmethods::SPARSim_estimation(
  ref_data = ref_data,
  other_prior = list(group.condition = group_condition,
                     dilution.factor = 50000,
                     volume = 0.01),
  verbose = TRUE,
  seed = 111
)
## check spike-in parameters
spikein_params <- estimate_result[["estimate_result"]][["SPARSim_spikein_parameter"]]
## simulate
simulate_result <- simmethods::SPARSim_simulation(
  parameters = estimate_result[["estimate_result"]],
  other_prior = list(de.prob = 0.2,
                     fc.group = 3),
  return_format = "list",
  verbose = TRUE,
  seed = 111
)

## End(Not run)


duohongrui/simmethods documentation built on June 17, 2024, 10:49 a.m.