Description Usage Arguments Details Value References Examples
View source: R/read_distribution_genes.R
simulateFastaReads
generates the fasta files for a given metatranscriptome
gene expression matrix, calculated with simulateMetaTranscriptome
. The
fasta files are written to a directory. This is a wrapper to the
simulate_experiment_countmat
function from the polyester package
1 2 3 4 5 6 7 8 9 10 11 | simulateFastaReads(
genomeFileDir,
simulatedDataSet,
outdir = ".",
paired = T,
seed = 42,
distr = "empirical",
error_model = "illumina5",
bias = "rnaf",
strand_specific = T
)
|
genomeFileDir |
Character string indicating the location of the fasta files for all genomes to be included in the metatranscriptome simulation. The basenames of these fasta files must match the rownames of the genomeReadMatrix composition matrix. See details |
simulatedDataSet |
Metatranscriptome gene expression matrix containing the
read counts per gene and per sample. Can be obtained using the function
|
outdir |
Path to the directory where the fasta files with the simulated reads should be written. Created if not exists. |
paired |
Logical, whether to generate paired-end fasta reads (defaults to TRUE) |
seed |
An integer, sets the random seed for the fasta generation |
distr |
Parameter to be passed to the |
error_model |
Parameter to be passed to the |
bias |
Parameter to be passed to the |
strand_specific |
Logical. Parameter to be passed to the
|
This function is a wrapper to the simulate_experiment_countmat
function from the polyester package. Reads can be simulated from a multifasta
file containing one sequence for each gene. Functionality to work with full genomes
+ GTF annotation files is not developed. Meanwhile, there are several (external)
tools that allow to convert a genome fasta + a GTF file into a transcript fasta,
such as gffread
from Cufflinks.
Writes the following files into the specified output directory: - Fasta files (*.fasta), one for each sample, containing the reads per gene specified in the metatranscriptomics gene count matrix (simulatedDataSet object) - Gene count matrix (countMatrix.txt): matrix with the counts per gene and per sample, useful to compare the results of mapping/differential expression tools to - Differentially expressed genes list (DEgenes.txt): if DE is set to TRUE, a two-column matrix, the first column indicating gene names and the second column the fold change applied to each gene. Otherwise the file will be empty.
- Alyssa C. Frazee, Andrew E. Jaffe, Rory Kirchner and Jeffrey T. Leek (2018). polyester: Simulate RNA-seq reads. R package version 1.16.0.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | # First, define a list of genomes to simulate. The names of these genomes need to match
# the names of the fasta files (without the extension). The genomes used are:
# - F. prausnitzii
# - R. intestinalis
# - L. johnsonii
# - E. faecalis
# - B. obeum
genomesToSimulate <- c("fprausnitzii", "rintestinalis", "ljohnsonii", "efaecalis",
"bobeum")
# Then, obtain the empirical composition matrix for this 5 species
compMatrix <- compositionGenomesMetaT(composition="empirical", empiricalSeed=1,
genomes=genomesToSimulate, nReads=500000,
nReplicates=10)
# Obtain the gene expression matrix for the full community (metatranscriptome)
# incorporating differential expression: 10% genes (in each bacterium) have a 2-fold
# overexpression and 10% have a 0.5-fold depletion.
# No composition matrix is provided, so the one from the pasilla dataset will be used.
# As there are 10 samples in the count matrix, we assign 5 cases and 5 controls.
genomesFolder = system.file("extdata", package = "metaester", mustWork = TRUE)
metatranscriptome <- simulateMetaTranscriptome(genomeFileDir=genomesFolder,
genomeReadMatrix=compMatrix, DE=TRUE,
foldChanges=c(0.5,1,2),
foldProbs=c(10,80,10),
nSamples=5, nControls=5)
# Finally, generate the fasta files and write them to the output directory
simulateFastaReads(genomeFileDir=genomesFolder, simulatedDataSet=metatranscriptome,
outdir=".")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.