View source: R/simulate_alternative_splicing.R
simulate_alternative_splicing | R Documentation |
Firstly, exon supersets are created by joining all exons of a gene from a gtf/gff file. Next, splicing variants are created with documentation and event annotation based on the users input. Finally, fastq files containing RNA-seq reads from the splice variants and the real exon and junction coverage are created using a modified version of the polyester R package available on https://github.com/biomedbigdata/polyester.
simulate_alternative_splicing( input_dir, outdir, event_probs = NULL, preset = NULL, ncores = 1L, ... )
input_dir |
Character path to directory containing the gtf/gff file from which splice variants are created and genome fasta files with one file per chromosome i.e. <chr_name>.fa passed to polyester. |
outdir |
character, path to folder where simulated reads and all annotations should be written, with no slash at the end. By default, reads are written to current working directory. |
event_probs |
Named list/vector containing numerics corresponding
to the probabilites to create the event(-combination).
If |
preset |
if you want to use preset parameters one of
'event_partition', 'experiment_bias', 'event_combination_2'.
Check |
ncores |
the number of cores to be utilized for parallel generation of splice variant creation and read simulation. This will spawn one process per core! Be aware that a lot of memory might be required for many processes. |
... |
any of several other arguments that can be used to add nuance to the simulation and splice variant creation. See details. |
Reads are simulated from a GTF file which is produced by
create_splicing_variants_and_annotation
plus DNA
sequences.
Several optional parameters can be passed to this function to adjust the
simulation. For polyester parameters refer to polyester::simulate_experiment
:
novel_variants
: Numeric value between 0 and 1 indicating the percentage
of splicing variants that will not be written to an additional gtf file splicing_variants_novel.gtf.
write_gff
: Additionally to the gtf file containing the splice variants,
a gff3 file with the same content will be printed to the outdir.
Default TRUE
max_genes
: The maximum number of genes/exon supersets to be included
in the process of splice variant creation.
Default NULL
which means that all available exon supersets will be used.
This is a computation heavy default and you might want to adjust it!
exon_junction_coverage
: Should the real coverage of exons, junctions
and retained introns be written into a additional file.
Default TRUE
multi_events_per_exon
: Should it be possible to have more than one AS event
at the same exon if multiple variants are created for the same exon superset?
!If this option is set to TRUE
, there may occur unforeseen AS events
that are not documented in the event_annotation file!.
Default FALSE
probs_as_freq
: Should event_probs
be treated as relative frequencies instead of probabilities?
Default FALSE
save_exon_superset
: Should the exon supersets be saved to .rda file?
Default TRUE
Parameters passed to polyester that we assigned different defaults to than in simulate_experiment
:
fold_changes
: Currently, ASimulatoR introduces random isoform switches.
Those can be retraced in the sim_tx_info.txt file written by polyester.
We plan on improving this in the future.
strand_specific
: Strand-specific simulation (1st read forward strand,
2nd read reverse strand with respect to transcript sequence). Default TRUE
.
meanmodel
: reads_per_transcripts
as a function of transcript length. Always TRUE
in ASimulatoR.
frag_GC_bias
: A sample-specific GC content bias on the fragment level. Currently not supported in ASimulatoR: always 'none'.
verbose
: Should progress messages be printed during the sequencing process? Default TRUE
.
exon_junction_coverage
: Should the coverage of exons, junctions and retained introns be determined? Default TRUE
.
exon_junction_table
: If exon_junction_coverage=TRUE
a data.table
produced by create_splicing_variants_and_annotation
to determine exon and intron coverage.
No return, but simulated reads, a simulation info file,
an alternative splicing event annotation and exon and junction coverages are written
to outdir
.
Alyssa C. Frazee, Andrew E. Jaffe, Ben Langmead, Jeffrey T. Leek, Polyester: simulating RNA-seq datasets with differential transcript expression, Bioinformatics, Volume 31, Issue 17, 1 September 2015, Pages 2778–2784, https://doi.org/10.1093/bioinformatics/btv272
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.