simulate_alternative_splicing: Simulate RNA-seq experiment with splicing variants

View source: R/simulate_alternative_splicing.R

simulate_alternative_splicingR Documentation

Simulate RNA-seq experiment with splicing variants

Description

Firstly, exon supersets are created by joining all exons of a gene from a gtf/gff file. Next, splicing variants are created with documentation and event annotation based on the users input. Finally, fastq files containing RNA-seq reads from the splice variants and the real exon and junction coverage are created using a modified version of the polyester R package available on https://github.com/biomedbigdata/polyester.

Usage

simulate_alternative_splicing(
  input_dir,
  outdir,
  event_probs = NULL,
  preset = NULL,
  ncores = 1L,
  ...
)

Arguments

input_dir

Character path to directory containing the gtf/gff file from which splice variants are created and genome fasta files with one file per chromosome i.e. <chr_name>.fa passed to polyester.

outdir

character, path to folder where simulated reads and all annotations should be written, with no slash at the end. By default, reads are written to current working directory.

event_probs

Named list/vector containing numerics corresponding to the probabilites to create the event(-combination). If probs_as_freq is TRUE event_probs correspond to the relative frequency of occurences for the event (combination) and in this case the sum of all frequencies has to be <=1. No default, must not be NULL, except if preset is given.

preset

if you want to use preset parameters one of 'event_partition', 'experiment_bias', 'event_combination_2'. Check ?presets for more information

ncores

the number of cores to be utilized for parallel generation of splice variant creation and read simulation. This will spawn one process per core! Be aware that a lot of memory might be required for many processes.

...

any of several other arguments that can be used to add nuance to the simulation and splice variant creation. See details.

Details

Reads are simulated from a GTF file which is produced by create_splicing_variants_and_annotation plus DNA sequences.

Several optional parameters can be passed to this function to adjust the simulation. For polyester parameters refer to polyester::simulate_experiment:

  • novel_variants: Numeric value between 0 and 1 indicating the percentage of splicing variants that will not be written to an additional gtf file splicing_variants_novel.gtf.

  • write_gff: Additionally to the gtf file containing the splice variants, a gff3 file with the same content will be printed to the outdir. Default TRUE

  • max_genes: The maximum number of genes/exon supersets to be included in the process of splice variant creation. Default NULL which means that all available exon supersets will be used. This is a computation heavy default and you might want to adjust it!

  • exon_junction_coverage: Should the real coverage of exons, junctions and retained introns be written into a additional file. Default TRUE

  • multi_events_per_exon: Should it be possible to have more than one AS event at the same exon if multiple variants are created for the same exon superset? !If this option is set to TRUE, there may occur unforeseen AS events that are not documented in the event_annotation file!. Default FALSE

  • probs_as_freq: Should event_probs be treated as relative frequencies instead of probabilities? Default FALSE

  • save_exon_superset: Should the exon supersets be saved to .rda file? Default TRUE

Parameters passed to polyester that we assigned different defaults to than in simulate_experiment:

  • fold_changes: Currently, ASimulatoR introduces random isoform switches. Those can be retraced in the sim_tx_info.txt file written by polyester. We plan on improving this in the future.

  • strand_specific: Strand-specific simulation (1st read forward strand, 2nd read reverse strand with respect to transcript sequence). Default TRUE.

  • meanmodel: reads_per_transcripts as a function of transcript length. Always TRUE in ASimulatoR.

  • frag_GC_bias: A sample-specific GC content bias on the fragment level. Currently not supported in ASimulatoR: always 'none'.

  • verbose: Should progress messages be printed during the sequencing process? Default TRUE.

  • exon_junction_coverage: Should the coverage of exons, junctions and retained introns be determined? Default TRUE.

  • exon_junction_table: If exon_junction_coverage=TRUE a data.table produced by create_splicing_variants_and_annotation to determine exon and intron coverage.

Value

No return, but simulated reads, a simulation info file, an alternative splicing event annotation and exon and junction coverages are written to outdir.

References

Alyssa C. Frazee, Andrew E. Jaffe, Ben Langmead, Jeffrey T. Leek, Polyester: simulating RNA-seq datasets with differential transcript expression, Bioinformatics, Volume 31, Issue 17, 1 September 2015, Pages 2778–2784, https://doi.org/10.1093/bioinformatics/btv272


biomedbigdata/ASimulatoR documentation built on Sept. 6, 2022, 7:55 p.m.