View source: R/stochastic_simulation.R
getRNAseqMatrix | R Documentation |
Transforms a time-point of a simulation into RNA-seq-like data, i.e. simulates a read count for each RNA molecule per individual (each individual is considered as a sample).
getRNAseqMatrix( simdf, insilicosystem, samplingTime = max(simdf$time), laneEffect = F, nLanes = 2, propRnasSampled = 0.9, samplesLibSize = NULL, genesLength = NULL, mrnasOnly = T, mergeComplexes = F, meanLogLibSize_lane = 7, sdLogLibSize_lane = 0.5, sdLogLibSize_samples = 0.2 )
simdf |
The data-frame with the result of the simulation (see |
insilicosystem |
The simulated in silico system (see |
samplingTime |
Numeric. Time-point of the simulation to be transformed. By default, the maximum time of the simulation is used. |
laneEffect |
Boolean. Are the samples processed on different lanes/batches (see |
nLanes |
Numeric. How many lanes are there in the experiment (see |
propRnasSampled |
Numeric. The proportion of molecules of RNAs that are sampled in each individual. Must be between 0 and 1. Default value is 0.9. |
samplesLibSize |
Vector of expected library size for each individual/sample. If named, the names of the vector must correspond to the names of the individuals as specified
in the result of the simulation. If none provided, will be sampled from a log-normal distribution (see |
genesLength |
Vector of gene length for each gene in the system. Its length must be equal to the number of protein-coding genes in the system (if |
mrnasOnly |
Boolean. Are the noncoding RNAs to be discarded before the transformation? If TRUE, read counts will be returned only for protein-coding RNAs. Default value is TRUE. |
mergeComplexes |
Boolean. Are the RNAs in complex accounted for in the read counts (i.e. are they detected by the RNA-seq experiment)? Default value is FALSE. See also |
meanLogLibSize_lane |
Numeric. The mean of the log10 mean library size normal distribution (see |
sdLogLibSize_lane |
Numeric. The sd of the log10 mean library size normal distribution (see |
sdLogLibSize_samples |
Numeric. The sd of the log10 samples library size normal distribution (see |
The abundance of the RNA form of each gene at time samplingTime
is extracted from the result of the simulation. If mrnasOnly = TRUE
, non-coding RNAs are discarded. If the simulation contains
several trials per individual (see simulateInSilicoSystem
), the abundance of each RNA is summed over the different trials for each individual. The abundance of
the RNAs in each individual are then transformed into "noisy proportions": for each individual, tot_RNAs
times propRnasSampled
RNAs molecules are sampled from the
total RNA molecules in the individual (where tot_RNAs
is the total number of RNA molecules of a given individual). If propRnasSampled
is 1, this step simply
corresponds to dividing the abundance of each RNA by the total number of RNAs for the individual. Otherwise, if propRnasSampled
is less than 1, it introduces some
stochasticity in the "measurement" of RNAs as some low-expressed RNAs might not be represented. The noisy proportion of for each RNA is multiplied by the gene length (by default
set to 1 for all genes), to reproduce the bias of RNA-seq experiments in which longer genes get more reads. The proportions are re-scaled such that their sum over all RNAs for
each individual equals 1. The expected count of each RNA in each individual is then computed as the product of the corresponding noisy proportion and the library size of the individual.
The latter can be provided by the user (parameter samplesLibSize
); otherwise it is sampled using the function sampleLibrarySize
. Finally, the actual read count of each RNA for
each individual is sampled from a Poisson distribution with parameter lambda equal to the corresponding expected count.
A list:
rnaSeqMatrix
A tibble giving for each RNA (column "Molecule") the observed read count in each individual (other columns, one per individual).
samplesLibSize
The expected library size of each individual. May not be equal to the total read counts for the individual, as the actual counts are sampled from a Poisson distribution.
genesLength
The length of each gene.
## Not run: mysystem = createInSilicoSystem(G = 5, regcomplexes = "none", ploidy = 2, PC.p = 1) mypop = createInSilicoPopulation(10, mysystem) sim = simulateInSilicoSystem(mysystem, mypop, simtime = 1000, ntrials = 10, nepochs = 5) rnaSeq = getRNAseqMatrix(sim$Simulation, mysystem, laneEffect = F) ## With a batch/lane effect on the library size of the samples rnaSeq = getRNAseqMatrix(sim$Simulation, mysystem, laneEffect = T) ## Providing the library size of each sample/individual libsize = rnorm(length(unique(sim$Simulation$Ind)), 1e7, 1e5) names(libsize) = unique(sim$Simulation$Ind) rnaSeq = getRNAseqMatrix(sim$Simulation, mysystem, samplesLibSize = libsize) ## Accounting for different gene lengths genes_length = sample(1:200, nrow(mysystem$genes)) names(genes_length) = as.character(1:nrow(mysystem$genes)) rnaSeq = getRNAseqMatrix(sim$Simulation, mysystem, samplesLibSize = libsize, genesLength = genes_length) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.