SCRIP_simulation: Simulate Datasets by SCRIP
In duohongrui/simmethods: A collection of 40+ simulation methods for singlce-cell RNA seqencing data

SCRIP_simulation

R Documentation

Simulate Datasets by SCRIP

Description

This function is used to simulate datasets from learned parameters by SCRIPsimu function in SCRIP package.

Usage

SCRIP_simulation(
  parameters,
  ref_data = NULL,
  other_prior = NULL,
  return_format,
  verbose = FALSE,
  seed
)

Arguments

`parameters`	A object generated by `splatter::splatEstimate()`
`ref_data`	A matrix for one dataset or a list of datasets with their own names. This is usually unused except for some methods, e.g. SCRIP, scDesign, zingeR.
`other_prior`	A list with names of certain parameters. Some methods need extra parameters to execute the estimation step, so you must input them. In simulation step, the number of cells, genes, groups, batches, the percent of DEGs are usually customed, so before simulating a dataset you must point it out. See `Details` below for more information.
`return_format`	A character. Alternative choices: list, SingleCellExperiment, Seurat, h5ad. If you select `h5ad`, you will get a path where the .h5ad file saves to.
`verbose`	Logical. Whether to return messages or not.
`seed`	A random seed.

Details

SCRIP contains five different simulation modes, and you can specify which mode do you use (default is GP-trendedBCV):

GP-trendedBCV
GP-commonBCV
BGP-commonBCV
BP
BGP-trendedBCV

In addtion to simulate datasets with default parameters, users want to simulate other kinds of datasets, e.g. a counts matrix with 2 or more cell groups. In SCRIP, you can set extra parameters to simulate datasets.

The customed parameters you can set are below:

nCells. In SCRIP, you can not set nCells directly and should set batchCells instead. For example, if you want to simulate 1000 cells, you can type other_prior = list(batchCells = 1000). If you type other_prior = list(batchCells = c(500, 500)), the simulated data will have two batches.
nGenes. You can directly set other_prior = list(nGenes = 5000) to simulate 5000 genes.
nGroups. You can not directly set other_prior = list(nGroups = 3) to simulate 3 groups. Instead, you should set other_prior = list(prob.group = c(0.2, 0.3, 0.5)) where the sum of group probabilities must equal to 1.
de.prob. You can directly set other_prior = list(de.prob = 0.2) to simulate DEGs that account for 20 percent of all genes.
prob.group. You can directly set other_prior = list(prob.group = c(0.2, 0.3, 0.5)) to assign three proportions of cell groups. Note that the number of groups always equals to the length of the vector.
nBatches. You can not directly set other_prior = list(nBatches = 3) to simulate 3 batches. Instead, you should set other_prior = list(batchCells = c(500, 500, 500)) to reach the goal and the total cells are 1500.

For more customed parameters in SCRIP, please check SCRIP::SCRIPsimu().

References

Qin F, Luo X, Xiao F, et al. SCRIP: an accurate simulator for single-cell RNA sequencing data. Bioinformatics, 2022, 38(5): 1304-1311. https://doi.org/10.1093/bioinformatics/btab824

CRAN URL: https://cran.r-project.org/web/packages/SCRIP/index.html

Github URL: https://github.com/thecailab/SCRIP

Examples

## Not run: 
# Load data
ref_data <- simmethods::data
# Estimate parameters
estimate_result <- simmethods::SCRIP_estimation(ref_data = data,
                                                verbose = TRUE,
                                                seed = 10)

# (1) Simulate 500 cells (Since we can not set nCells directly, so we can set
# batchCells (a numeric vector)) and 2000 genes
simulate_result <- simmethods::SCRIP_simulation(parameters = estimate_result[["estimate_result"]],
                                                ref_data = ref_data,
                                                other_prior = list(batchCells = 500,
                                                                   nGenes = 2000),
                                                return_format = "list",
                                                verbose = TRUE,
                                                seed = 111)
count_data <- simulate_result[["simulate_result"]][["count_data"]]
dim(count_data)


# (2) Simulate by another mode
simulate_result <- simmethods::SCRIP_simulation(parameters = estimate_result[["estimate_result"]],
                                                ref_data = ref_data,
                                                other_prior = list(mode = "BP"),
                                                return_format = "list",
                                                verbose = TRUE,
                                                seed = 111)
count_data <- simulate_result[["simulate_result"]][["count_data"]]
dim(count_data)


# (3) Simulate two groups (de.prob = 0.1) and one batch
simulate_result <- simmethods::SCRIP_simulation(parameters = estimate_result[["estimate_result"]],
                                                ref_data = ref_data,
                                                other_prior = list(prob.group = c(0.4, 0.6)),
                                                return_format = "list",
                                                verbose = TRUE,
                                                seed = 111)
count_data <- simulate_result[["simulate_result"]][["count_data"]]
dim(count_data)
## cell information
col_data <- simulate_result[["simulate_result"]][["col_meta"]]
table(col_data$group)
## gene information
row_data <- simulate_result[["simulate_result"]][["row_meta"]]
### The result of SCRIP contains the factors of different groups and uses can
### calculate the fold change by division. For example, the DEFactors of A gene
### in Group1 and Group2 are respectively 2 and 1, and the fold change of A gene
### is 2/1=2 or 1/2=0.5.
fc_group1_to_group2 <- row_data$DEFacGroup2/row_data$DEFacGroup1
table(fc_group1_to_group2 != 1)[2]/4000 ## de.prob = 0.1
### number of all DEGs
table(row_data$de_gene)[2]/4000


# (4) Simulate two groups (de.prob = 0.2) and two batches
## Since we can not set nBatches directly, so we can set batchCells (a numeric vector)
## to determin the number of batches and cells simutaniously.
simulate_result <- simmethods::SCRIP_simulation(parameters = estimate_result[["estimate_result"]],
                                                ref_data = ref_data,
                                                other_prior = list(prob.group = c(0.4, 0.6),
                                                                   batchCells = c(80, 80),
                                                                   de.prob = 0.2),
                                                return_format = "list",
                                                verbose = TRUE,
                                                seed = 111)
count_data <- simulate_result[["simulate_result"]][["count_data"]]
dim(count_data)
col_data <- simulate_result[["simulate_result"]][["col_meta"]]
table(col_data$group)
table(col_data$batch)


# (5) Simulate three groups (de.prob = 0.2) and two batches
## Since we can not set nBatches directly, so we can set batchCells (a numeric vector)
## to determin the number of batches and cells simutaniously.
simulate_result <- simmethods::SCRIP_simulation(parameters = estimate_result[["estimate_result"]],
                                                ref_data = ref_data,
                                                other_prior = list(prob.group = c(0.4, 0.3, 0.3),
                                                                   batchCells = c(80, 80),
                                                                   de.prob = 0.2),
                                                return_format = "list",
                                                verbose = TRUE,
                                                seed = 111)
count_data <- simulate_result[["simulate_result"]][["count_data"]]
dim(count_data)
col_data <- simulate_result[["simulate_result"]][["col_meta"]]
table(col_data$group)
table(col_data$batch)
## row data
row_data <- simulate_result[["simulate_result"]][["row_meta"]]
### DEGsj
table(row_data$de_gene)
### fold change of Group1 to Group2
fc_group1_to_group2 <- row_data$DEFacGroup2/row_data$DEFacGroup1
table(fc_group1_to_group2 != 1)[2]/4000
### fold change of Group1 to Group3
fc_group1_to_group3 <- row_data$DEFacGroup3/row_data$DEFacGroup1
table(fc_group1_to_group3 != 1)[2]/4000
### fold change of Group2 to Group3
fc_group2_to_group3 <- row_data$DEFacGroup3/row_data$DEFacGroup2
table(fc_group2_to_group3 != 1)[2]/4000

# (6) Simulate continous cell trajectory
simulate_result <- simmethods::SCRIP_simulation(ref_data = ref_data,
                                                parameters = estimate_result[["estimate_result"]],
                                                other_prior = list(batchCells = 1000,
                                                                   nGenes = 1000,
                                                                   paths = TRUE),
                                                return_format = "SingleCellExperiment",
                                                verbose = TRUE,
                                                seed = 111)
## plot
result <- scater::logNormCounts(simulate_result[["simulate_result"]])
result <- scater::runPCA(result)
plotPCA(result, colour_by = "group")

# (7) Simulate continous cell trajectory (three groups)
simulate_result <- simmethods::SCRIP_simulation(ref_data = ref_data,
                                                parameters = estimate_result[["estimate_result"]],
                                                other_prior = list(batchCells = 1000,
                                                                   nGenes = 1000,
                                                                   paths = TRUE,
                                                                   prob.group = c(0.2, 0.4, 0.4),
                                                                   de.prob = 0.4),
                                                return_format = "SingleCellExperiment",
                                                verbose = TRUE,
                                                seed = 111)
## plot
result <- scater::logNormCounts(simulate_result[["simulate_result"]])
result <- scater::runPCA(result)
plotPCA(result, colour_by = "group")

## End(Not run)

duohongrui/simmethods documentation built on June 17, 2024, 10:49 a.m.

duohongrui/simmethods index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

duohongrui/simmethods
A collection of 40+ simulation methods for singlce-cell RNA seqencing data

SCRIP_simulation: Simulate Datasets by SCRIP
In duohongrui/simmethods: A collection of 40+ simulation methods for singlce-cell RNA seqencing data

Simulate Datasets by SCRIP

Description

Usage

Arguments

Details

References

Examples

Related to SCRIP_simulation in duohongrui/simmethods...

R Package Documentation

Browse R Packages

We want your feedback!

duohongrui/simmethods A collection of 40+ simulation methods for singlce-cell RNA seqencing data

SCRIP_simulation: Simulate Datasets by SCRIP In duohongrui/simmethods: A collection of 40+ simulation methods for singlce-cell RNA seqencing data

Simulate Datasets by SCRIP

Description

Usage

Arguments

Details

References

Examples

Related to SCRIP_simulation in duohongrui/simmethods...

R Package Documentation

Browse R Packages

We want your feedback!

duohongrui/simmethods
A collection of 40+ simulation methods for singlce-cell RNA seqencing data

SCRIP_simulation: Simulate Datasets by SCRIP
In duohongrui/simmethods: A collection of 40+ simulation methods for singlce-cell RNA seqencing data