Setup | R Documentation |
This function generates the settings needed for simulateDE
.
Firstly a set of differential expressed gene IDs with
associated fold changes for a given number of genes, simulations and
fraction of DE genes is generated. There are also a number of options relating
to count simulations such as downsampling.
Secondly, the estimated parameters for count simulations of genes (and spikes) are added.
Setup(ngenes=NULL, nsims=25,
p.DE = 0.1, pLFC = 1, p.G = 1,
p.B=NULL, bLFC=NULL, bPattern="uncorrelated",
n1=c(20,50,100), n2=c(30,60,120),
Thinning = NULL, LibSize='equal',
estParamRes, estSpikeRes=NULL,
DropGenes=FALSE,
setup.seed, verbose = TRUE)
ngenes |
is a numeric vector specifying the number of genes to simulate.
Default is |
nsims |
Number of simulations to run. Default is 25. |
p.DE |
Numeric vector between 0 and 1 representing
the percentage of genes being differentially expressed due to phenotype,
i.e. biological signal. Default is |
pLFC |
The log phenotypic fold change for DE genes. This can be:
(1) a constant, e.g. 2;
(2) a vector of values with length being number of DE genes.
If the input is a vector and the length is not the number of DE genes,
it will be sampled with replacement to generate log fold changes;
(3) a function that takes an integer n, and generates a vector of length n,
e.g. function(x) rnorm(x, mean=0, sd=1.5).
Default is |
p.G |
Numeric vector indicating the proportion of replicates per group
that express the phenotypic fold change. Default is |
p.B |
Numeric vector between 0 and 1 representing the percentage of genes
being differentially expressed between batches.
Default is |
bLFC |
The log batch fold change for all genes. This can be:
(1) a constant, e.g. 2;
(2) a vector of values with length being number of all genes.
If the input is a vector and the length is not the number of total genes,
it will be sampled with replacement to generate log fold changes;
(3) a function that takes an integer n, and generates a vector of length n,
e.g. function(x) rnorm(x, mean=0, sd=1.5).
Note that the simulations of only two batches is implemented.
Default is |
bPattern |
Character vector for batch effect pattern if |
n1, n2 |
Integer vectors specifying the number of biological replicates in each group. Default values are n1=c(20,50,100) and n2=c(30,60,120). The vectors need to have the same number of entries, i.e. length. |
Thinning |
Numeric vector specifying the downsampling.
It has to be the same length and order as the vector |
LibSize |
Size factors representing sample-specific differences/biases
in expected mean values of counts:
|
estParamRes |
The estimated simulation parameters for genes. This can be:
(1) The output of |
estSpikeRes |
The spike-in simulation parameters generated by |
DropGenes |
By default, the estimated parameters and fit based on genes that
were defined as expressed are used for simulations.
By setting this parameter to |
setup.seed |
Setup seed. |
verbose |
Logical vector to indicate whether to print function information.
Default is |
A complex list with the following entries:
DESetup |
A list of simulation options relating to the Differential Expression Setup: number of genes simulated (ngenes); number of simulation iterations (nsims); IDs of DE genes (DEid) which is a list (length=nsims) of vectors (length=ngenes*p.DE); log fold change of genes (pLFC) which is a list (length=nsims) of vectors (length=ngenes); similarly for batch effects (Bid and bLFC); list containing simulation seeds (sim.seed). |
SimSetup |
A list of simulation options relating to the Simulation Setup: the number of samples per group (n1 and n2); simulating spike-ins (spikeIns); Thinning parameters (Thinning, thinSpike); Library Size Factors (LibSize); dropout genes (DropGenes and DropRate). |
estParamRes |
A list object containing the gene simulation parameters provided. |
estSpikeRes |
A list object containing the spike-in simulation parameters provided. |
Beate Vieth
## Not run:
# estimate gene parameters
data("SmartSeq2_Gene_Read_Counts")
Batches = data.frame(Batch = sapply(strsplit(colnames(SmartSeq2_Gene_Read_Counts),
"_"), "[[", 1),
stringsAsFactors = F,
row.names = colnames(SmartSeq2_Gene_Read_Counts))
data("GeneLengths_mm10")
estparam_gene <- estimateParam(countData = SmartSeq2_Gene_Read_Counts,
readData = NULL,
batchData = Batches,
spikeData = NULL, spikeInfo = NULL,
Lengths = GeneLengths_mm10, MeanFragLengths = NULL,
RNAseq = 'singlecell', Protocol = 'Read',
Distribution = 'ZINB', Normalisation = "scran",
GeneFilter = 0.1, SampleFilter = 3,
sigma = 1.96, NCores = NULL, verbose = TRUE)
# estimate spike parameters
data("SmartSeq2_SpikeIns_Read_Counts")
data("SmartSeq2_SpikeInfo")
Batches = data.frame(Batch = sapply(strsplit(colnames(SmartSeq2_SpikeIns_Read_Counts),
"_"), "[[", 1),
stringsAsFactors = F,
row.names = colnames(SmartSeq2_SpikeIns_Read_Counts))
estparam_spike <- estimateSpike(spikeData = SmartSeq2_SpikeIns_Read_Counts,
spikeInfo = SmartSeq2_SpikeInfo,
MeanFragLength = NULL,
batchData = Batches,
Normalisation = 'depth')
# define log fold change
p.lfc <- function(x) sample(c(-1,1), size=x,replace=T)*rgamma(x, shape = 1, rate = 2)
# set up simulations
setupres <- Setup(ngenes = 10000, nsims = 25,
p.DE = 0.1, pLFC = p.lfc,
n1 = c(20,50,100), n2 = c(30,60,120),
Thinning = c(1,0.9,0.8), LibSize = 'given',
estParamRes = estparam_gene,
estSpikeRes = estparam_spike,
DropGenes = TRUE,
setup.seed = 52679, verbose = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.