simulate_experiment_countmat: Simulate RNA-seq experiment

Description Usage Arguments Details Value Examples

Description

create FASTA files containing RNA-seq reads simulated from provided transcripts, with optional differential expression between two groups (designated via read count matrix)

Usage

1
2
3
simulate_experiment_countmat(fasta = NULL, gtf = NULL, seqpath = NULL,
  readmat, outdir = ".", fraglen = 250, fragsd = 25, readlen = 100,
  error_rate = 0.005, paired = TRUE, seed = NULL, ...)

Arguments

fasta

path to FASTA file containing transcripts from which to simulate reads. See details.

gtf

path to GTF file containing transcript structures from which reads should be simulated. See details.

seqpath

path to folder containing one FASTA file (.fa extension) for each chromosome in gtf. See details.

readmat

matrix with rows representing transcripts and columns representing samples. Entry i,j specifies how many reads to simulate from transcript i for sample j.

outdir

character, path to folder where simulated reads should be written, without a slash at the end of the folder name. By default, reads written to the working directory.

fraglen

Mean RNA fragment length. Sequences will be read off the end(s) of these fragments.

fragsd

Standard deviation of fragment lengths.

readlen

Read length

error_rate

Sequencing error rate. Must be between 0 and 1. A uniform error model is assumed.

paired

If TRUE, paired-end reads are simulated; else single-end reads are simulated.

seed

Optional seed to set before simulating reads, for reproducibility.

...

Further arguments to pass to seq_gtf, if gtf is not NULL.

Details

Reads can either be simulated from a FASTA file of transcripts (provided with the fasta argument) or from a GTF file plus DNA sequences (provided with the gtf and seqpath arguments). Simulating from a GTF file and DNA sequences may be a bit slower: it took about 6 minutes to parse the GTF/sequence files for chromosomes 1-22, X, and Y in hg19.

Value

No return, but simulated reads are written to outdir.

Examples

1
2
3
4
5
6
7
  fastapath = system.file("extdata", "chr22.fa", package="polyester")
  numtx = count_transcripts(fastapath)
  readmat = matrix(20, ncol=10, nrow=numtx)
  readmat[1:30, 1:5] = 40

  simulate_experiment_countmat(fasta=fastapath,
    readmat=readmat, outdir='simulated_reads_2', seed=5)

alyssafrazee/polyester-release documentation built on May 12, 2019, 2:32 a.m.