simulateSet: simulateSet
In kdkorthauer/scDD: Mixture modeling of single-cell RNA-seq data to identify genes with differential distributions

simulateSet

R Documentation

simulateSet

Description

Simulation of a complete dataset, where the number of each type of differential distributions and equivalent distributions is specified.

Usage

simulateSet(SCdat, numSamples = 100, nDE = 250, nDP = 250, nDM = 250,
  nDB = 250, nEE = 5000, nEP = 4000, sd.range = c(1, 3), modeFC = c(2,
  3, 4), plots = TRUE, plot.file = NULL, random.seed = 284,
  varInflation = NULL, condition = "condition", param = bpparam())

Arguments

`SCdat`	An object of class `SingleCellExperiment` that contains normalized single-cell expression and metadata. The `assays` slot contains a named list of matrices, where the normalized counts are housed in the one named `normcounts`. This matrix should have one row for each gene and one sample for each column. The `colData` slot should contain a data.frame with one row per sample and columns that contain metadata for each sample. This data.frame should contain a variable that represents biological condition, which is in the form of numeric values (either 1 or 2) that indicates which condition each sample belongs to (in the same order as the columns of `normcounts`). Optional additional metadata about each cell can also be contained in this data.frame, and additional information about the experiment can be contained in the `metadata` slot as a list.
`numSamples`	numeric value for the number of samples in each condition to simulate
`nDE`	Number of DE genes to simulate
`nDP`	Number of DP genes to simulate
`nDM`	Number of DM genes to simulate
`nDB`	Number of DB genes to simulate
`nEE`	Number of EE genes to simulate
`nEP`	Number of EP genes to simulate
`sd.range`	Numeric vector of length two which describes the interval (lower, upper) of standard deviations of fold changes to randomly select.
`modeFC`	Vector of values to use for fold changes between modes for DP, DM, and DB.
`plots`	Logical indicating whether or not to generate fold change and validation plots
`plot.file`	Character containing the file string if the plots are to be sent to a pdf instead of to the standard output.
`random.seed`	Numeric value for a call to `set.seed` for reproducibility.
`varInflation`	Optional numeric vector with one element for each condition that corresponds to the multiplicative variance inflation factor to use when simulating data. Useful for sensitivity studies to assess the impact of confounding effects on differential variance across conditions. Currently assumes all samples within a condition are subject to the same variance inflation factor.
`condition`	A character object that contains the name of the column in `colData` that represents the biological group or condition of interest (e.g. treatment versus control). Note that this variable should only contain two possible values since `scDD` can currently only handle two-group comparisons. The default option assumes that there is a column named "condition" that contains this variable.
`param`	a `MulticoreParam` or `SnowParam` object of the `BiocParallel` package that defines a parallel backend. The default option is `BiocParallel::bpparam()` which will automatically creates a cluster appropriate for the operating system. Alternatively, the user can specify the number of cores they wish to use by first creating the corresponding `MulticoreParam` (for Linux-like OS) or `SnowParam` (for Windows) object, and then passing it into the `scDD` function. This could be done to specify a parallel backend on a Linux-like OS with, say 12 cores by setting `param=BiocParallel::MulticoreParam(workers=12)`

Value

An object of class SingleCellExperiment that contains simulated single-cell expression and metadata. The assays slot contains a named list of matrices, where the simulated counts are housed in the one named normcounts. This matrix should have one row for each gene (nDE + nDP + nDM + nDB + nEE + nEP rows) and one sample for each column (numSamples columns). The colData slot contains a data.frame with one row per sample and a column that represents biological condition, which is in the form of numeric values (either 1 or 2) that indicates which condition each sample belongs to (in the same order as the columns of normcounts). The rowData slot contains information about the category of the gene (EE, EP, DE, DM, DP, or DB), as well as the simulated foldchange value.

References

Korthauer KD, Chu LF, Newton MA, Li Y, Thomson J, Stewart R, Kendziorski C. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biology. 2016 Oct 25;17(1):222. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1077-y

Examples


# Load toy example ExpressionSet to simulate from

data(scDatEx)


# check that this object is a member of the ExpressionSet class
# and that it contains 142 samples and 500 genes

class(scDatEx)
show(scDatEx)


# set arguments to pass to simulateSet function
# we will simuate 30 genes total; 5 genes of each type;
# and 100 samples in each of two conditions

nDE <- 5
nDP <- 5
nDM <- 5
nDB <- 5
nEE <- 5
nEP <- 5
numSamples <- 100
seed <- 816


# create simulated set with specified numbers of DE, DP, DM, DM, EE, and 
# EP genes,
# specified number of samples, DE genes are 2 standard deviations apart, and 
# multimodal genes have modal distance of 4 standard deviations

SD <- simulateSet(scDatEx, numSamples=numSamples, nDE=nDE, nDP=nDP, nDM=nDM,
                  nDB=nDB, nEE=nEE, nEP=nEP, sd.range=c(2,2), modeFC=4, 
                  plots=FALSE, 
                  random.seed=seed)

kdkorthauer/scDD documentation built on March 27, 2022, 5:11 a.m.

kdkorthauer/scDD index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kdkorthauer/scDD
Mixture modeling of single-cell RNA-seq data to identify genes with differential distributions

simulateSet: simulateSet
In kdkorthauer/scDD: Mixture modeling of single-cell RNA-seq data to identify genes with differential distributions

simulateSet

Description

Usage

Arguments

Value

References

Examples

Related to simulateSet in kdkorthauer/scDD...

R Package Documentation

Browse R Packages

We want your feedback!

kdkorthauer/scDD Mixture modeling of single-cell RNA-seq data to identify genes with differential distributions

simulateSet: simulateSet In kdkorthauer/scDD: Mixture modeling of single-cell RNA-seq data to identify genes with differential distributions

simulateSet

Description

Usage

Arguments

Value

References

Examples

Related to simulateSet in kdkorthauer/scDD...

R Package Documentation

Browse R Packages

We want your feedback!

kdkorthauer/scDD
Mixture modeling of single-cell RNA-seq data to identify genes with differential distributions

simulateSet: simulateSet
In kdkorthauer/scDD: Mixture modeling of single-cell RNA-seq data to identify genes with differential distributions