simulate_experiment_countmat: Simulate RNA-seq experiment

Description Usage Arguments Details Value References Examples

View source: R/simulate_experiment_countmat.R

Description

create FASTA files containing RNA-seq reads simulated from provided transcripts, with optional differential expression between two groups (designated via read count matrix)

Usage

1
2
simulate_experiment_countmat(fasta = NULL, gtf = NULL, seqpath = NULL,
  readmat, outdir = ".", paired = TRUE, seed = NULL, ...)

Arguments

fasta

path to FASTA file containing transcripts from which to simulate reads. See details.

gtf

path to GTF file or data frame containing transcript structures from which reads should be simulated. See details and seq_gtf.

seqpath

path to folder containing one FASTA file (.fa extension) or DNAStringSet containing one entry for each chromosome in gtf. See details and seq_gtf.

readmat

matrix with rows representing transcripts and columns representing samples. Entry i,j specifies how many reads to simulate from transcript i for sample j.

outdir

character, path to folder where simulated reads should be written, without a slash at the end of the folder name. By default, reads written to the working directory.

paired

If TRUE, paired-end reads are simulated; else single-end reads are simulated.

seed

Optional seed to set before simulating reads, for reproducibility.

...

Additional arguments to add nuance to the simulation, as described extensively in the details of simulate_experiment, or to pass to seq_gtf, if gtf is not NULL.

Details

Reads can either be simulated from a FASTA file of transcripts (provided with the fasta argument) or from a GTF file plus DNA sequences (provided with the gtf and seqpath arguments). Simulating from a GTF file and DNA sequences may be a bit slower: it took about 6 minutes to parse the GTF/sequence files for chromosomes 1-22, X, and Y in hg19.

Value

No return, but simulated reads are written to outdir.

References

Li W and Jiang T (2012): Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads. Bioinformatics 28(22): 2914-2921.

Examples

1
2
3
4
5
6
7
  fastapath = system.file("extdata", "chr22.fa", package="polyester")
  numtx = count_transcripts(fastapath)
  readmat = matrix(20, ncol=10, nrow=numtx)
  readmat[1:30, 1:5] = 40

  simulate_experiment_countmat(fasta=fastapath, 
    readmat=readmat, outdir='simulated_reads_2', seed=5)

polyester documentation built on Nov. 8, 2020, 8:09 p.m.