Description Usage Arguments Details Value References Examples
View source: R/read_distribution_genes.R
simulateMetaTranscriptome
simulates a gene count matrix for an entire
metatranscriptome
1 2 3 4 5 6 7 8 9 10 11 |
genomeFileDir |
Character string indicating the location of the fasta files for all genomes to be included in the metatranscriptome simulation. The basenames of these fasta files must match the rownames of the genomeReadMatrix composition matrix. See details |
genomeReadMatrix |
Microbial composition matrix containing the number of reads
per genome and per sample. Can be obtained using the function
|
modelMatrix |
A composition matrix of gene expression, in which rows represent genes and columns represent replicates. User can provide one of their own, otherwise the matrix from the Pasilla dataset will be used. It's used to fit a zero-inflated negative binomial and set the parameters to randomly assign gene expression to the genes from the microbial genome. |
DE |
Logical, whether or not to simulate differential expression between cases and controls (defaults to FALSE) |
foldChanges |
Numeric vector, containing the fold changes to simulate. It should contain the value 1, for genes which are not differentially expressed. Required if DE set to TRUE |
foldProbs |
Numeric vector, containing the probabilities for each of the fold-
changes specified in the parameter |
nSamples |
An integer, must be specified if DE is set to TRUE. Number of cases
in the simulated experiment. nSamples + nControls must be equal to the number of
columns in the composition matrix |
nControls |
An integer, must be specified if DE is set to TRUE. Number of controls
in the simulated experiment. nSamples + nControls must be equal to the number of
columns in the composition matrix |
seed |
An integer, sets the random seed for the read distribution. |
This function iterates over all the genomes present in the composition matrix
and simulates their corresponding gene expression matrix, putting them all together
Valid fasta extensions for the fasta files located in genomeFileDir
:
*.fa, *.fasta, *.fna, *.genes.fa, *.genes.fasta, *.genes.fna
A list, containing the following elements: - simulationData: a data.frame with the read counts for each gene and each sample. Each row represents a gene and each column a sample. If there is differential expression, column names indicate whether each sample is a case or a control - numSamples: if DE is set to TRUE, the number of cases specified, otherwise NULL - numControls: if DE is set to TRUE, the number of controls, otherwise NULL - DEgenes: if DE is set to TRUE, a two-column data.frame, the first column indicating gene names and the second column the fold change applied to each gene
- Huber W, Reyes A (2018). pasilla: Data package with per-exon and per-gene read counts of RNA-seq samples of Pasilla knock-down by Brooks et al. R package version 1.8.0 - Alyssa C. Frazee, Andrew E. Jaffe, Rory Kirchner and Jeffrey T. Leek (2018). polyester: Simulate RNA-seq reads. R package version 1.16.0.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | # First, define a list of genomes to simulate. The names of these genomes need to match
# the names of the fasta files (without the extension). The genomes used are:
# - F. prausnitzii
# - R. intestinalis
# - L. johnsonii
# - E. faecalis
# - B. obeum
genomesToSimulate <- c("fprausnitzii", "rintestinalis", "ljohnsonii", "efaecalis",
"bobeum")
# Then, obtain the empirical composition matrix for this 5 species
compMatrix <- compositionGenomesMetaT(composition="empirical", empiricalSeed=1,
genomes=genomesToSimulate, nReads=500000,
nReplicates=10)
# Obtain the gene expression matrix for the full community (metatranscriptome)
# In this case, there is no differential expression in any of the bacteria.
# No composition matrix is provided, so the one from the pasilla dataset will be used.
# For this, first indicate the location of the fasta files
genomesFolder = system.file("extdata", package = "metaester", mustWork = TRUE)
metatranscriptome <- simulateMetaTranscriptome(genomeFileDir=genomesFolder,
genomeReadMatrix=compMatrix)
# Obtain the gene expression matrix for the full community (metatranscriptome)
# incorporating differential expression: 10% genes (in each bacterium) have a 2-fold
# overexpression and 10% have a 0.5-fold depletion.
# No composition matrix is provided, so the one from the pasilla dataset will be used.
# As there are 10 samples in the count matrix, we assign 5 cases and 5 controls.
metatranscriptome <- simulateMetaTranscriptome(genomeFileDir=genomesFolder,
genomeReadMatrix=compMatrix, DE=TRUE,
foldChanges=c(0.5,1,2),
foldProbs=c(10,80,10),
nSamples=5, nControls=5)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.