simulate_experiment_countmat: Simulate RNA-seq experiment
In alyssafrazee/polyester-release: Simulate RNA-seq reads

Description Usage Arguments Details Value Examples

create FASTA files containing RNA-seq reads simulated from provided transcripts, with optional differential expression between two groups (designated via read count matrix)

1
2
3

simulate_experiment_countmat(fasta = NULL, gtf = NULL, seqpath = NULL,
  readmat, outdir = ".", fraglen = 250, fragsd = 25, readlen = 100,
  error_rate = 0.005, paired = TRUE, seed = NULL, ...)

`fasta`	path to FASTA file containing transcripts from which to simulate reads. See details.
`gtf`	path to GTF file containing transcript structures from which reads should be simulated. See details.
`seqpath`	path to folder containing one FASTA file (`.fa` extension) for each chromosome in `gtf`. See details.
`readmat`	matrix with rows representing transcripts and columns representing samples. Entry i,j specifies how many reads to simulate from transcript i for sample j.
`outdir`	character, path to folder where simulated reads should be written, without a slash at the end of the folder name. By default, reads written to the working directory.
`fraglen`	Mean RNA fragment length. Sequences will be read off the end(s) of these fragments.
`fragsd`	Standard deviation of fragment lengths.
`readlen`	Read length
`error_rate`	Sequencing error rate. Must be between 0 and 1. A uniform error model is assumed.
`paired`	If `TRUE`, paired-end reads are simulated; else single-end reads are simulated.
`seed`	Optional seed to set before simulating reads, for reproducibility.
`...`	Further arguments to pass to `seq_gtf`, if `gtf` is not `NULL`.

Reads can either be simulated from a FASTA file of transcripts (provided with the fasta argument) or from a GTF file plus DNA sequences (provided with the gtf and seqpath arguments). Simulating from a GTF file and DNA sequences may be a bit slower: it took about 6 minutes to parse the GTF/sequence files for chromosomes 1-22, X, and Y in hg19.

No return, but simulated reads are written to outdir.

  fastapath = system.file("extdata", "chr22.fa", package="polyester")
  numtx = count_transcripts(fastapath)
  readmat = matrix(20, ncol=10, nrow=numtx)
  readmat[1:30, 1:5] = 40

  simulate_experiment_countmat(fasta=fastapath,
    readmat=readmat, outdir='simulated_reads_2', seed=5)