SpliceWiz-package: SpliceWiz: efficient and precise alternative splicing...

SpliceWiz-packageR Documentation

SpliceWiz: efficient and precise alternative splicing analysis in R

Description

SpliceWiz is a computationally efficient and user friendly workflow that analyses aligned short-read RNA sequencing for differential intron retention and alternative splicing.

Details

SpliceWiz uses isoform-specific alignments to quantify percent-spliced-in ratios (i.e. ratio of the "included" isoform, as a proportion of "included" and "excluded" isoforms). For intron retention (IR), the abundance of the intron-retaining transcript (included isoform) is quantified using the trimmed-mean depth of intron coverage with reads, whereas the spliced transcript (excluded isoform) is measured as the splicing of the intron as well as that of overlapping introns (since splicing of any overlapping intron implies the intron of interest is not retained). For other forms of alternative splicing, junction reads (reads aligned across splice junctions) are used to quantify included and excluded isoforms.

SpliceWiz processes BAM files (aligned RNA sequencing) using ompBAM::ompBAM-package. ompBAM is a C++ library that allows R packages (via the Rcpp framework) to efficiently read BAM files using OpenMP-based multi-threading. SpliceWiz processes BAM files via the processBAM function, using a splicing and intron reference built from any given genome / gene annotation resource using the buildRef function. processBAM generates two outputs per BAM file: a txt.gz file which is a gzip-compressed text file with multiple tables, containing information including junction read counts and intron retention metrics. This output is very similar to that of IRFinder, as the analysis steps of SpliceWiz's BAM processing was built on an improved version of IRFinder's source code (version 1.3.1). Additionally, processBAM outputs a COV file, which is a binary bgzf-compressed file that contains strand-specific coverage data.

Once individual files have been analysed, SpliceWiz compiles a dataset using these individual outputs, using collateData. This function unifies junctions detected across the dataset, and generates included / excluded counts of all putative IR events and annotated alternative splicing events (ASEs). This dataset is exported as a collection of files including an H5 database. The data is later imported into the R session using the makeSE function, as a NxtSE object.

The NxtSE object is a specialized SummarizedExperiment object tailored for use in SpliceWiz. Annotation of rows provide information about ASEs via rowData, while columns allows users to provide annotations via colData.

SpliceWiz offers several novel filters via the ASEFilter class. See ASEFilter for details.

Once the NxtSE is annotated and filtered, differential analysis is performed, using limma, DoubleExpSeq (DES), edgeR and DESeq2 wrappers. These wrappers model isoform counts as log-normal (limma), beta-binomial (DES) and negative-binomial (edgeR and DESeq2) distributions. See ASE-methods for details.

Finally, SpliceWiz provides visualisation tools to illustrate alternative splicing using coverage plots, including a novel method to normalise RNA-seq coverage grouped by experimental condition. This approach accounts for variations introduced by sequenced library size and gene expression. SpliceWiz efficiently computes and visualises means and variations in per-nucleotide coverage depth across alternate exons in genomic loci.

The main functions are:

  • Build-Reference-methods - Prepares genome and gene annotation references from FASTA and GTF files and synthesizes the SpliceWiz reference for processing BAM files, collating the NxtSE object.

  • STAR-methods - (Optional) Provides wrapper functions to build the STAR genome reference and alignment of short-read FASTQ raw sequencing files. This functionality is only available on systems with STAR installed.

  • processBAM - OpenMP/C++ based algorithm to analyse single or multiple BAM files.

  • collateData - Collates an experiment based on multiple IRFinder outputs for individual samples, into one unified H5-based data structure.

  • makeSE - Constructs a NxtSE (H5-based SummarizedExperiment) object, specialised to house measurements of retained introns and junction counts of alternative splice events.

  • applyFilters - Use default or custom filters to remove alternative splicing or IR events pertaining to low-abundance genes and transcripts.

  • ASE-methods - one-step method to perform differential alternate splice event (ASE) analysis on a NxtSE object using limma or DESeq2.

  • make_plot_data: Functions that compile individual and group-mean percent spliced in (PSI) values of IR and alternative splice events; useful to produce scatter plots or heatmaps.

  • Coverage: methods that retrieve coverage data from COV files.

  • getCoverageData / getPlotObject / plotView: Functions for plotting SpliceWiz's novel coverage plots.

See the SpliceWiz Quick-Start for worked examples on how to use SpliceWiz SpliceWiz Cookbook for real-life usage examples

Author(s)

Alex Wong

References

Wong ACH, Wong JJ-L, Rasko JEJ, Schmitz U. SpliceWiz: interactive analysis and visualization of alternative splicing in R. Briefings in Bioinformatics, Volume 25, Issue 1, January 2024, bbad468. https://doi.org/10.1093/bib/bbad468


alexchwong/SpliceWiz documentation built on March 17, 2024, 3:16 a.m.