R/SpliceWiz-package.R

#' SpliceWiz: efficient and precise alternative splicing analysis in R
#'
#' SpliceWiz is a computationally efficient and user friendly workflow that
#' analyses aligned short-read RNA sequencing for differential
#' intron retention and alternative splicing. 
#'
#' @details
#' SpliceWiz uses isoform-specific alignments to quantify percent-spliced-in 
#' ratios (i.e. ratio of the "included" isoform, as a proportion of "included" 
#' and "excluded" isoforms). For intron retention (IR), the abundance of the 
#' intron-retaining transcript (included isoform) is quantified using the 
#' trimmed-mean depth of intron coverage with reads, whereas the spliced
#' transcript (excluded isoform) is measured as the splicing of the intron as 
#' well as that of overlapping introns (since splicing of any overlapping intron
#' implies the intron of interest is not retained). For other forms of 
#' alternative splicing, junction reads (reads aligned across splice junctions) 
#' are used to quantify included and excluded isoforms.
#' 
#' SpliceWiz processes BAM files (aligned RNA sequencing) using 
#' [ompBAM::ompBAM-package]. ompBAM
#' is a C++ library that allows R packages (via the Rcpp framework) to 
#' efficiently read BAM files using OpenMP-based multi-threading. SpliceWiz
#' processes BAM files via the [processBAM] function, using a splicing and
#' intron reference built from any given genome / gene annotation resource
#' using the [buildRef] function. [processBAM] generates two outputs per
#' BAM file: a `txt.gz` file which is a gzip-compressed text file with multiple
#' tables, containing information including junction read counts and intron
#' retention metrics. This output is very similar to that of 
#' [IRFinder](https://github.com/williamritchie/IRFinder), as the analysis
#' steps of SpliceWiz's BAM processing was built on an improved version of
#' IRFinder's source code (version 1.3.1). Additionally, [processBAM] outputs
#' a COV file, which is a binary bgzf-compressed file that contains
#' strand-specific coverage data.
#'
#' Once individual files have been analysed, SpliceWiz compiles a dataset using
#' these individual outputs, using [collateData]. This function unifies 
#' junctions detected across the dataset, and generates included / excluded
#' counts of all putative IR events and annotated alternative splicing events
#' (ASEs). This dataset is exported as a collection of files including an
#' H5 database. The data is later imported into the R session using the
#' [makeSE] function, as a \linkS4class{NxtSE} object.
#' 
#' The \linkS4class{NxtSE} object is a specialized 
#' \linkS4class{SummarizedExperiment} object tailored for use in SpliceWiz.
#' Annotation of rows provide information about ASEs via [rowData], while
#' columns allows users to provide annotations via [colData].
#'
#' SpliceWiz offers several novel filters via the \linkS4class{ASEFilter}
#' class. See [ASEFilter] for details.
#'
#' Once the \linkS4class{NxtSE} is annotated and filtered, differential
#' analysis is performed, using limma, DoubleExpSeq (DES), edgeR and 
#' DESeq2 wrappers. These wrappers model isoform counts as log-normal (limma), 
#' beta-binomial (DES) and negative-binomial (edgeR and DESeq2) distributions. 
#' See [ASE-methods] for details.
#'
#' Finally, SpliceWiz provides visualisation tools to illustrate alternative
#' splicing using coverage plots, including a novel method to normalise RNA-seq
#' coverage grouped by experimental condition. This approach accounts for
#' variations introduced by sequenced library size and gene expression. 
#' SpliceWiz efficiently computes and visualises means and variations in 
#' per-nucleotide coverage depth across alternate exons in genomic loci.
#'
#' The main functions are:
#'
#' * [Build-Reference-methods] - Prepares genome and gene annotation
#'   references from FASTA and GTF files and synthesizes the SpliceWiz reference
#'   for processing BAM files, collating the \linkS4class{NxtSE} object.
#' * [STAR-methods] - (Optional) Provides wrapper functions to build the STAR
#'   genome reference and alignment of short-read FASTQ raw sequencing files.
#'   This functionality is only available on systems with STAR installed.                                                  
#' * [processBAM] - OpenMP/C++ based algorithm to analyse
#'   single or multiple BAM files.
#' * [collateData] - Collates an experiment based on multiple IRFinder outputs
#'   for individual samples, into one unified H5-based data structure.
#' * [makeSE] - Constructs a \linkS4class{NxtSE} (H5-based
#'   SummarizedExperiment) object, specialised to house measurements of retained
#'   introns and junction counts of alternative splice events.
#' * [applyFilters] - Use default or custom filters to remove alternative
#'   splicing or IR events pertaining to low-abundance genes and transcripts.
#' * [ASE-methods] - one-step method to perform differential alternate splice
#'   event (ASE) analysis on a NxtSE object using limma or DESeq2.
#' * [make_plot_data]: Functions that compile individual and group-mean percent
#'   spliced in (PSI) values of IR and alternative splice events; useful to
#'   produce scatter plots or heatmaps.
#' * [Coverage]: methods that retrieve coverage data from COV files.
#' * [getCoverageData] / [getPlotObject] / [plotView]: 
#'   Functions for plotting SpliceWiz's novel coverage plots.
#'
#' See the
#' [SpliceWiz Quick-Start](../doc/SW_QuickStart.html)
#' for worked examples on how to use SpliceWiz
#' [SpliceWiz Cookbook](../doc/SW_Cookbook.html)
#' for real-life usage examples           
#'
#' @author Alex Wong
#'
#' @docType package
#' @name SpliceWiz-package
#' @aliases SpliceWiz-package
#' @keywords package
#' @references
#' Wong ACH, Wong JJ-L, Rasko JEJ, Schmitz U.
#' SpliceWiz: interactive analysis and visualization of alternative splicing in R.
#' Briefings in Bioinformatics, Volume 25, Issue 1, January 2024, bbad468.
#' \url{https://doi.org/10.1093/bib/bbad468}
#' @md
NULL
alexchwong/SpliceWiz documentation built on March 17, 2024, 3:16 a.m.