simulateSet | R Documentation |
Simulation of a complete dataset, where the number of each type of differential distributions and equivalent distributions is specified.
simulateSet(SCdat, numSamples = 100, nDE = 250, nDP = 250, nDM = 250, nDB = 250, nEE = 5000, nEP = 4000, sd.range = c(1, 3), modeFC = c(2, 3, 4), plots = TRUE, plot.file = NULL, random.seed = 284, varInflation = NULL, condition = "condition", param = bpparam())
SCdat |
An object of class |
numSamples |
numeric value for the number of samples in each condition to simulate |
nDE |
Number of DE genes to simulate |
nDP |
Number of DP genes to simulate |
nDM |
Number of DM genes to simulate |
nDB |
Number of DB genes to simulate |
nEE |
Number of EE genes to simulate |
nEP |
Number of EP genes to simulate |
sd.range |
Numeric vector of length two which describes the interval (lower, upper) of standard deviations of fold changes to randomly select. |
modeFC |
Vector of values to use for fold changes between modes for DP, DM, and DB. |
plots |
Logical indicating whether or not to generate fold change and validation plots |
plot.file |
Character containing the file string if the plots are to be sent to a pdf instead of to the standard output. |
random.seed |
Numeric value for a call to |
varInflation |
Optional numeric vector with one element for each condition that corresponds to the multiplicative variance inflation factor to use when simulating data. Useful for sensitivity studies to assess the impact of confounding effects on differential variance across conditions. Currently assumes all samples within a condition are subject to the same variance inflation factor. |
condition |
A character object that contains the name of the column in
|
param |
a |
An object of class SingleCellExperiment
that contains
simulated single-cell expression and metadata. The assays
slot contains a named list of matrices, where the simulated counts are
housed in the one named normcounts
. This matrix should have one
row for each gene (nDE + nDP + nDM + nDB + nEE
+ nEP
rows) and one sample for each column (numSamples
columns).
The colData
slot contains a data.frame with one row per
sample and a column that represents biological condition, which is
in the form of numeric values (either 1 or 2) that indicates which
condition each sample belongs to (in the same order as the columns of
normcounts
). The rowData
slot contains information about the
category of the gene (EE, EP, DE, DM, DP, or DB), as well as the simulated
foldchange value.
Korthauer KD, Chu LF, Newton MA, Li Y, Thomson J, Stewart R, Kendziorski C. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biology. 2016 Oct 25;17(1):222. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1077-y
# Load toy example ExpressionSet to simulate from data(scDatEx) # check that this object is a member of the ExpressionSet class # and that it contains 142 samples and 500 genes class(scDatEx) show(scDatEx) # set arguments to pass to simulateSet function # we will simuate 30 genes total; 5 genes of each type; # and 100 samples in each of two conditions nDE <- 5 nDP <- 5 nDM <- 5 nDB <- 5 nEE <- 5 nEP <- 5 numSamples <- 100 seed <- 816 # create simulated set with specified numbers of DE, DP, DM, DM, EE, and # EP genes, # specified number of samples, DE genes are 2 standard deviations apart, and # multimodal genes have modal distance of 4 standard deviations SD <- simulateSet(scDatEx, numSamples=numSamples, nDE=nDE, nDP=nDP, nDM=nDM, nDB=nDB, nEE=nEE, nEP=nEP, sd.range=c(2,2), modeFC=4, plots=FALSE, random.seed=seed)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.