SPsimSeq_simulation: Simulate Datasets by SPsimSeq

View source: R/20-SPsimSeq.R

SPsimSeq_simulationR Documentation

Simulate Datasets by SPsimSeq

Description

This function is used to simulate datasets from learned parameters by SPsimSeq function in SPsimSeq package.

Usage

SPsimSeq_simulation(
  ref_data,
  other_prior = NULL,
  return_format,
  verbose = FALSE,
  seed
)

Arguments

ref_data

A matrix for one dataset or a list of datasets with their own names. This is usually unused except for some methods, e.g. SCRIP, scDesign, zingeR, SPsimSeq.

other_prior

A list with names of certain parameters. Some methods need extra parameters to execute the estimation step, so you must input them. In simulation step, the number of cells, genes, groups, batches, the percent of DEGs are usually customed, so before simulating a dataset you must point it out. See Details below for more information.

return_format

A character. Alternative choices: list, SingleCellExperiment, Seurat, h5ad. If you select h5ad, you will get a path where the .h5ad file saves to.

verbose

Logical. Whether to return messages or not.

seed

A random seed.

Details

In addtion to simulate datasets with default parameters, users want to simulate other kinds of datasets, e.g. a counts matrix with 2 or more cell groups. In SPsimSeq, you can set extra parameters to simulate datasets.

The customed parameters you can set are below:

  1. nCells. In SPsimSeq, you can set nCells directly. For example, if you want to simulate 1000 cells, you can type other_prior = list(nCells = 1000).

  2. nGenes. You can directly set other_prior = list(nGenes = 5000) to simulate 5000 genes.

  3. group.condition. You can input cell group information as an integer vector to specify which group that each cell belongs to. See Examples.

  4. de.prob. You can directly set other_prior = list(de.prob = 0.2) to simulate DEGs that account for 20 percent of all genes.

  5. fc.group. You can directly set other_prior = list(fc.group = 2) to specify the minimum fold change of DEGs.

  6. batch.condition. You can input cell batch information as an integer vector to specify which batch that each cell belongs to. See Examples.

For more customed parameters in SPsimSeq, please check SPsimSeq::SPsimSeq().

References

Assefa A T, Vandesompele J, Thas O. SPsimSeq: semi-parametric simulation of bulk and single-cell RNA-sequencing data. Bioinformatics, 2020, 36(10): 3276-3278. https://doi.org/10.1093/bioinformatics/btaa105

Bioconductor URL: https://www.bioconductor.org/packages/release/bioc/html/SPsimSeq.html

Github URL: https://github.com/CenterForStatistics-UGent/SPsimSeq

Examples

## Not run: 
# SPsimSeq can simulate datasets directly without estimation step.
ref_data <- simmethods::data

# 1) Simulate with default parameters
simulate_result <- simmethods::SPsimSeq_simulation(
  ref_data = ref_data,
  other_prior = NULL,
  return_format = "list",
  verbose = TRUE,
  seed = 111
)

## counts
counts <- simulate_result[["simulate_result"]][["count_data"]]
dim(counts)
## cell information
col_data <- simulate_result[["simulate_result"]][["col_meta"]]
table(col_data$group)
table(col_data$batch)


# 2) Simulate two groups (20% proportion of DEGs, minimum 2 fold change)
group_condition <- as.numeric(simmethods::group_condition)
simulate_result <- simmethods::SPsimSeq_simulation(
  ref_data = ref_data,
  other_prior = list(nCells = 1000,
                     nGenes = 2000,
                     group.condition = group_condition,
                     de.prob = 0.2,
                     fc.group = 2),
  return_format = "list",
  verbose = TRUE,
  seed = 111
)

## counts
counts <- simulate_result[["simulate_result"]][["count_data"]]
dim(counts)
## cell information
col_data <- simulate_result[["simulate_result"]][["col_meta"]]
table(col_data$group)
table(col_data$batch)
## gene information
row_data <- simulate_result[["simulate_result"]][["row_meta"]]
table(row_data$de_gene)[2]/2000


# 3) Simulate two batches
group_condition <- as.numeric(simmethods::group_condition)
simulate_result <- simmethods::SPsimSeq_simulation(
  ref_data = ref_data,
  other_prior = list(nCells = 1000,
                     nGenes = 2000,
                     group.condition = group_condition,
                     de.prob = 0.2,
                     fc.group = 2,
                     batch.condition = sample(1:2, ncol(ref_data), replace = TRUE)),
  return_format = "list",
  verbose = TRUE,
  seed = 111
)

## counts
counts <- simulate_result[["simulate_result"]][["count_data"]]
dim(counts)
## cell information
col_data <- simulate_result[["simulate_result"]][["col_meta"]]
table(col_data$group)
table(col_data$batch)
## gene information
row_data <- simulate_result[["simulate_result"]][["row_meta"]]
table(row_data$de_gene)[2]/2000

## End(Not run)


duohongrui/simmethods documentation built on June 17, 2024, 10:49 a.m.