SpliceWiz: interactive analysis and visualization of alternative splicing in R

Documented in ASE_DESeq ASE_DoubleExpSeq ASE_edgeR ASE_edgeR_timeseries ASE_limma ASE_limma_timeseries

#' Differential Alternative Splicing Event analysis
#'
#' Use Limma, DESeq2, DoubleExpSeq, and edgeR wrapper functions 
#' to test for differential Alternative Splice Events (ASEs)
#'
#' @details
#'
#' Using **limma**, SpliceWiz models included and excluded counts as log-normal
#' distributed, whereas
#' using **DESeq2**, SpliceWiz models included and excluded counts as negative
#' binomial distributed with dispersion shrinkage according to their mean count
#' expressions.
#' For **limma** and **DESeq2**, differential ASE are considered as the
#' "interaction" between included and excluded splice counts for each sample.
#'  See [this vignette](https://rpubs.com/mikelove/ase) for an explanation of
#' how this is done.
#'
#' SpliceWiz's **limma** wrapper implements an additional filter where ASEs with
#' an average cpm values of either Included or Excluded counts are less than 1.
#' **DESeq2** has its own method for handling outliers, which seems to work well
#' for handling situations where PSI ~= 0 or PSI ~= 1.
#'
#' Time series are supported by SpliceWiz to a limited extent. 
#' Time series analysis can be performed via limma or DESeq2.
#' For limma time-series analysis, use `ASE_limma_timeseries()`, specifying
#' the `test_factor` as the column of numeric values containing 
#' time series data. For DESeq, time series differential analysis can be 
#' activated using the `ASE_DESeq()` function, again specifying `test_factor`
#' as the column containing time series data (and leaving `test_nom`
#' and `test_denom` parameters blank). See examples below.
#'
#' **edgeR** models counts using a negative binomial model. It accounts 
#' appropriately for zero-counts which are often 
#' problematic as PSI approaches zero or one, leading to false positives. The
#' edgeR-based option produces differential ASEs that are less biased towards
#' low counts. Our preliminary analysis shows it to be more accurate than limma
#' or DoubleExpSeq based methods.
#'
#' For time series analysis using edgeR, `ASE_edgeR_timeseries()` can be
#' used interchangeably with its counterpart limma-based function. For
#' complex models, please see [ASE-GLM-edgeR] to build your own GLM models.
#'
#' Using **DoubleExpSeq**, included and excluded counts are modeled using
#' the generalized beta prime distribution, using empirical Bayes shrinkage
#' to estimate dispersion.
#'
#' **EventType** are as follow:
#' * `IR` = intron retention (IR-ratio) - all introns are considered
#' * `MXE` = mutually exclusive exons
#' * `SE` = skipped exons
#' * `AFE` = alternate first exon
#' * `ALE` = alternate last exon
#' * `A5SS` = alternate 5'-splice site
#' * `A3SS` = alternate 3'-splice site
#' * `RI` = (known / annotated) intron retention (PSI).
#'
#' NB: SpliceWiz measures intron retention events using two different 
#'   approaches, the choice of which is left to the user - see [ASE-methods]:
#' * **IR** (intron retention) events: considers all introns to be potentially
#' retained. Given in most scenarios there may be uncertainty as to which of the
#' many mutually-overlapping introns are spliced to produce the major isoform,
#' SpliceWiz adopts the IRFinder approach by using the IR-ratio. The "included"
#' isoform is the relative abundance of the IR-transcript, as approximated by
#' the trimmed-mean depth of coverage across the intron (excluding outliers
#' including exons of other transcripts, intronic elements such as snoRNAs, 
#' etc). The "excluded isoform" includes **all** spliced transcripts that 
#' contain an overlapping intron, as estimated via SpliceWiz's `SpliceOver` and 
#' IRFinder's `SpliceMax` methods - see [collateData].
#' * **RI** (annotated retained introns) considers only annotated retained 
#' introns, i.e., those annotated within the given reference. These are
#' quantified using PSI, considering the included (IR-transcript) and excluded
#' (splicing of the exact intron) as binary alternatives. 
#'
#' SpliceWiz considers "included" counts as those that represent abundance of 
#' the "included" isoform, whereas "excluded" counts represent the abundance of 
#' the "excluded" isoform.
#' To allow comparison between modalities, SpliceWiz applies a convention 
#' whereby the "included" transcript is one where its splice junctions
#' are by definition shorter than those of "excluded" transcripts.
#' Specifically, this means the included / excluded isoforms are as follows:
#'
#' | EventType | Included | Excluded |
#' | :---: | :---: | :---: |
#' | IR or RI | Intron Retention | Spliced Intron |
#' | MXE | Upstream exon inclusion | Downstream exon inclusion |
#' | SE | Exon inclusion | Exon skipping |
#' | AFE | Downstream exon usage | Upstream exon usage |
#' | ALE | Upstream exon usage | Downstream exon usage |
#' | A5SS | Downstream 5'-SS | Upstream 5'-SS |
#' | A3SS | Upstream 3'-SS | Downstream 3'-SS |
#'
#' @param se The \linkS4class{NxtSE} object created by `makeSE()`. To reduce
#'   runtime and avoid excessive multiple testing, consider filtering
#'   the object using [applyFilters]
#' @param test_factor The column name in the sample annotation `colData(se)` 
#'   that contains the desired variables to be contrasted. For 
#'   `ASE_limma_timeseries()` and `ASE_DESeq()` (when `test_nom` and 
#'   `test_denom` parameters are left blank), `test_factor` must contain 
#'   numerical values representing the time variable.
#' @param test_nom The nominator condition to test for differential ASE. Usually
#'   the "treatment" condition
#' @param test_denom The denominator condition to test against for differential
#'   ASE. Usually the "control" condition
#' @param batch1,batch2 (Optional, limma and DESeq2 only) One or two condition
#'   types containing batch information to account for.
#' @param IRmode (default `all`) Choose the approach to quantify IR events.
#'   Default `all` considers all introns as potentially retained, and calculates
#'   IR-ratio based on total splicing across the intron using the "SpliceOver"
#'   or "SpliceMax" approach (see [collateData]). Other options include 
#'   `annotated` which calculates IR-ratios for annotated introns only, and
#'   `annotated_binary` which calculates PSI considering the "included"
#'   isoform as the IR-transcript, and the "excluded" transcript is
#'   quantified from splice counts only across the exact intron 
#'   (but not that of overlapping introns). IR-ratio are denoted as "IR" events,
#'   whereas PSIs calculated using IR and intron-spliced binary alternatives are
#'   denoted as "RI" events.
#' @param filter_antiover,filter_antinear Whether to remove novel IR events that
#'   overlap over or near anti-sense genes. Default will exclude antiover but
#'   not antinear introns. These are ignored if strand-specific RNA-seq 
#'   protocols are used.
#' @param n_threads (DESeq2 only) How many threads to use for DESeq2
#'   based analysis.
#' @param degrees_of_freedom (default `1`) The complexity of time series trends
#'   modeled by `ASE_limma_timeseries` and `ASE_edgeR_timeseries`. 
#'   E.g., `1` will only model linear trends,
#'   `2` extends the capacity for quadratic trends, and `3` for cubic
#'   trends, etc.
#' @param useQL (default `TRUE`) Whether to use edgeR's quasi-likelihood method
#'   to help reduce false positives from near-zero junction / intron counts. NB:
#'   edgeR's quasi-likelihood method is run with legacy method 
#'   (Lun and Smyth (2017)).
#' @return For all methods, a data.table containing the following:
#'   * EventName: The name of the ASE event. This identifies each ASE
#'     in downstream functions including [makeMeanPSI], [makeMatrix],
#'     and [plotCoverage]
#'   * EventType: The type of event. See details section above.
#'   * EventRegion: The genomic coordinates the event occupies. This spans the
#'     most upstream and most downstream splice junction involved in the ASE,
#'     and is use to guide the [plotCoverage] function.
#'   * flags: Indicates which isoforms are NMD substrates and/or which are
#'     formed by novel splicing only.
#'   * AvgPSI_nom, Avg_PSI_denom: the average percent spliced in / percent
#'     IR levels for the two conditions being contrasted. `nom` and `denom` in
#'     column names are replaced with the condition names. Note this is a
#'     geometric mean, based on the arithmetic mean of logit PSI values.
#'   * deltaPSI: The difference in PSI between the mean values of the two
#'     conditions.
#'   * abs_deltaPSI: The absolute value of difference in PSI between 
#'     the mean values of the two conditions.
#'
#'   **limma specific output**
#'   * logFC, AveExpr, t, P.Value, adj.P.Val, B: limma topTable columns of
#'     differential ASE. See [limma::topTable] for details.
#'   * inc/exc_(logFC, AveExpr, t, P.Value, adj.P.Val, B): limma results
#'     for differential testing for raw included / excluded counts only
#'
#'   **edgeR specific output** equivalent to statistics returned by 
#'   [edgeR::topTags]:
#'   * logFC, logCPM, F, PValue, FDR: log fold change, log counts per million,
#'     F statistic, p value and (Benjamini Hochberg) adjusted p values.
#'   * inc/exc_(...): edgeR statistics corresponding to 
#'     differential expression testing for raw included / excluded counts
#'     in isolation
#'
#'   **DESeq2 specific output**
#'   * baseMean, log2FoldChange, lfcSE, stat, pvalue, padj:
#'     DESeq2 results columns for differential ASE; see [DESeq2::results] for
#'     details.
#'   * inc/exc_(baseMean, log2FoldChange, lfcSE, stat, pvalue, padj):
#'     DESeq2 results for differential testing for
#'     raw included / excluded counts only
#'
#'   **DoubleExp specific output**
#'   * MLE_nom, MLE_denom: Maximum likelihood expectation of PSI values for the 
#"     two groups. `nom` and
#'     `denom` in column names are replaced with the condition names
#'   * MLE_LFC: Log2-fold change of the MLE
#'   * P.Value, adj.P.Val: Nominal and BH-adjusted P values
#'   * n_eff: Number of effective samples (i.e. non-zero or non-unity PSI)
#'   * mDepth: Mean Depth of splice coverage in each of the two groups.
#'   * Dispersion_Reduced, Dispersion_Full: Dispersion values for reduced and
#'     full models. See [DoubleExpSeq::DBGLM1] for details.
#' @examples
#' # Load the NxtSE object and set up the annotations
#' # - see ?makeSE on example code of generating this NxtSE object
#' se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE)
#'
#' colData(se)$treatment <- rep(c("A", "B"), each = 3)
#' colData(se)$replicate <- rep(c("P","Q","R"), 2)
#'
#' # Limma analysis (counts modeled using log-normal distribution)
#'
#' require("limma")
#' res_limma <- ASE_limma(se, "treatment", "A", "B")
#'
#' # edgeR analysis (counts modeled using negative binomial distribution)
#' # - QL: whether quasi-likelihood method was used
#'
#' require("edgeR")
#' res_edgeR <- ASE_edgeR(se, "treatment", "A", "B", useQL = FALSE)
#' res_edgeR_QL <- ASE_edgeR(se, "treatment", "A", "B", useQL = TRUE)
#'
#' # DoubleExpSeq analysis (counts modeled using beta binomial distribution)
#'
#' require("DoubleExpSeq")
#' res_DES <- ASE_DoubleExpSeq(se, "treatment", "A", "B")
#'
#' # DESeq2 analysis (counts modeled using negative binomial distribution)
#'
#' require("DESeq2")
#' res_DESeq <- ASE_DESeq(se, "treatment", "A", "B")
#' 
#' # Time series examples
#'
#' colData(se)$timepoint <- rep(c(1,2,3), each = 2)
#' colData(se)$batch <- rep(c("1", "2"), 3)
#'
#' res_limma_timeseries <- ASE_limma_timeseries(se, "timepoint")
#' res_edgeR_timeseries <- ASE_edgeR_timeseries(se, "timepoint")
#' res_DESeq_timeseries <- ASE_DESeq(se, "timepoint")
#' 
#' @name ASE-methods
#' @references
#' Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015).
#' 'limma powers differential expression analyses for RNA-sequencing and
#' microarray studies.' Nucleic Acids Research, 43(7), e47.
#' \url{https://doi.org/10.1093/nar/gkv007}
#'
#' Love MI, Huber W, Anders S (2014). 'Moderated estimation of fold change and
#' dispersion for RNA-seq data with DESeq2.' Genome Biology, 15, 550.
#' \url{https://doi.org/10.1186/s13059-014-0550-8}
#'
#' Ruddy S, Johnson M, Purdom E (2016). 'Shrinkage of dispersion parameters in
#' the binomial family, with application to differential exon skipping.'
#' Ann. Appl. Stat. 10(2): 690-725.
#' \url{https://doi.org/10.1214/15-AOAS871}
#'
#' Gilis J, Vitting-Seerup K, Van den Berge K, Clement L (2021). 'Scalable
#' analysis of differential transcript usage for bulk and single-cell
#' RNA-sequencing applications.' F1000Research 2021, 10:374.
#' \url{https://doi.org/10.12688/f1000research.51749.1}
#'
#' Lun A, Smyth G (2017).
#' 'No counts, no variance: allowing for loss of degrees of freedom when
#' assessing biological variability from RNA-seq data' 
#' Stat Appl Genet Mol Biol, 017 Apr 25;16(2):83-93.
#' \url{https://doi.org/10.1515/sagmb-2017-0010}
#' @md
NULL

#' @describeIn ASE-methods Use limma to perform differential ASE analysis of
#'   a filtered NxtSE object
#' @export
ASE_limma <- function(se, test_factor, test_nom, test_denom,
        batch1 = "", batch2 = "",
        IRmode = c("all", "annotated", "annotated_binary"),
        filter_antiover = TRUE, filter_antinear = FALSE) {

    .check_package_installed("limma", "3.44.0")
    .ASE_check_args(colData(se), test_factor,
        test_nom, test_denom, batch1, batch2)
    IRmode <- match.arg(IRmode)
    se_use <- .ASE_filter(
        se, filter_antiover, filter_antinear, IRmode)

    if(nrow(se_use) == 0)
        .log("No events for ASE analysis after filtering")

    .log("Performing limma contrast for included / excluded counts separately",
        "message")
    res.limma2 <- .ASE_limma_contrast(se_use,
        test_factor, test_nom, test_denom,
        batch1, batch2)
    res.inc <- res.limma2[grepl(".Included", get("EventName"))]
    res.inc[, c("EventName") :=
        sub(".Included","",get("EventName"), fixed=TRUE)]
    res.inc <- res.inc[get("AveExpr") > 1]   # Filter as 0/5 is not diff to 0/10
    res.exc <- res.limma2[grepl(".Excluded", get("EventName"))]
    res.exc[, c("EventName") :=
        sub(".Excluded","",get("EventName"), fixed=TRUE)]
    res.exc <- res.exc[get("AveExpr") > 1]

    .log("Performing limma contrast for included / excluded counts together",
        "message")
    rowData <- as.data.frame(rowData(se_use))
    se_use <- se_use[rowData$EventName %in% res.inc$EventName &
        rowData$EventName %in% res.exc$EventName,]
    res.ASE <- .ASE_limma_contrast_ASE(se_use,
        test_factor, test_nom, test_denom,
        batch1, batch2)
    res.ASE[res.inc, on = "EventName",
        paste("Inc", colnames(res.inc)[seq_len(6)], sep=".") :=
        list(get("i.logFC"), get("i.AveExpr"), get("i.t"),
            get("i.P.Value"), get("i.adj.P.Val"), get("i.B"))]
    res.ASE[res.exc, on = "EventName",
        paste("Exc", colnames(res.exc)[seq_len(6)], sep=".") :=
        list(get("i.logFC"), get("i.AveExpr"), get("i.t"),
            get("i.P.Value"), get("i.adj.P.Val"), get("i.B"))]
    setorderv(res.ASE, "B", order = -1)
    res.ASE <- .ASE_add_diag(res.ASE, se_use, test_factor, 
        test_nom, test_denom)
    return(res.ASE)
}

#' @describeIn ASE-methods Use edgeR to perform differential ASE analysis of
#'   a filtered NxtSE object
#' @export
ASE_edgeR <- function(
        se, test_factor, test_nom, test_denom,
        batch1 = "", batch2 = "",
        useQL = TRUE,
        IRmode = c("all", "annotated", "annotated_binary"),
        filter_antiover = TRUE, filter_antinear = FALSE
) {

    .check_package_installed("edgeR", "3.32.0")
    .ASE_check_args(colData(se), test_factor,
        test_nom, test_denom, batch1, batch2)
    IRmode <- match.arg(IRmode)
    se_use <- .ASE_filter(
        se, filter_antiover, filter_antinear, IRmode)

    if(nrow(se_use) == 0)
        .log("No events for ASE analysis after filtering")

    .log("Performing edgeR contrast for included / excluded counts separately",
        "message")
    res.edgeR <- .ASE_edgeR_contrast(se_use,
        test_factor, test_nom, test_denom,
        batch1, batch2, useQL)
    res.inc <- res.edgeR[grepl(".Included", get("EventName"))]
    res.inc[, c("EventName") :=
        sub(".Included","",get("EventName"), fixed=TRUE)]
    # res.inc <- res.inc[get("AveExpr") > 1]   # Filter as 0/5 is not diff to 0/10
    res.exc <- res.edgeR[grepl(".Excluded", get("EventName"))]
    res.exc[, c("EventName") :=
        sub(".Excluded","",get("EventName"), fixed=TRUE)]
    # res.exc <- res.exc[get("AveExpr") > 1]

    .log("Performing edgeR contrast for included / excluded counts together",
        "message")
    rowData <- as.data.frame(rowData(se_use))
    se_use <- se_use[rowData$EventName %in% res.inc$EventName &
        rowData$EventName %in% res.exc$EventName,]
    res.ASE <- .ASE_edgeR_contrast_ASE(se_use,
        test_factor, test_nom, test_denom,
        batch1, batch2, useQL)
    
    colnames(res.inc)[-ncol(res.inc)] <- paste(
        "Inc", colnames(res.inc)[-ncol(res.inc)], sep = ".")
    colnames(res.exc)[-ncol(res.exc)] <- paste(
        "Exc", colnames(res.exc)[-ncol(res.exc)], sep = ".")

    res.ASE <- res.ASE[res.inc, on = "EventName"]
    res.ASE <- res.ASE[res.exc, on = "EventName"]

    orderCol <- ifelse(useQL, "F", "LR")
    setorderv(res.ASE, orderCol, order = -1)
    
    res.ASE <- .ASE_add_diag(res.ASE, se_use, test_factor, 
        test_nom, test_denom)
    return(res.ASE)
}


#' @describeIn ASE-methods Use limma to perform differential ASE analysis of
#'   a filtered NxtSE object (time series)
#' @export
ASE_limma_timeseries <- function(se, test_factor,
        batch1 = "", batch2 = "",
        degrees_of_freedom = 1,
        IRmode = c("all", "annotated", "annotated_binary"),
        filter_antiover = TRUE, filter_antinear = FALSE
) {

    .check_package_installed("limma", "3.44.0")
    .ASE_check_args(colData(se), test_factor, "", "",
        batch1, batch2, allowTimeSeries = TRUE)
    IRmode <- match.arg(IRmode)
    se_use <- .ASE_filter(
        se, filter_antiover, filter_antinear, IRmode)

    if(nrow(se_use) == 0)
        .log("No events for ASE analysis after filtering")

    .log("Performing limma contrast for included / excluded counts separately",
        "message")
    res.limma2 <- .ASE_limma_contrast_ts(se_use,
        test_factor,
        batch1, batch2, degrees_of_freedom)
    res.inc <- res.limma2[grepl(".Included", get("EventName"))]
    res.inc[, c("EventName") :=
        sub(".Included","",get("EventName"), fixed=TRUE)]
    res.inc <- res.inc[get("AveExpr") > 1]   # Filter as 0/5 is not diff to 0/10
    res.exc <- res.limma2[grepl(".Excluded", get("EventName"))]
    res.exc[, c("EventName") :=
        sub(".Excluded","",get("EventName"), fixed=TRUE)]
    res.exc <- res.exc[get("AveExpr") > 1]

    .log("Performing limma contrast for included / excluded counts together",
        "message")
    rowData <- as.data.frame(rowData(se_use))
    se_use <- se_use[rowData$EventName %in% res.inc$EventName &
        rowData$EventName %in% res.exc$EventName,]
    res.ASE <- .ASE_limma_contrast_ASE_ts(se_use,
        test_factor, # test_nom, test_denom,
        batch1, batch2, degrees_of_freedom)
    cols <- seq(
        ncol(res.inc) - 4,
        ncol(res.inc) - 1        
    )
    if(degrees_of_freedom == 1) {
        res.ASE[res.inc, on = "EventName",
            paste("Inc", colnames(res.inc)[seq_len(6)], sep=".") :=
            list(get("i.logFC"), get("i.AveExpr"), get("i.t"),
                get("i.P.Value"), get("i.adj.P.Val"), get("i.B"))]
        res.ASE[res.exc, on = "EventName",
            paste("Exc", colnames(res.exc)[seq_len(6)], sep=".") :=
            list(get("i.logFC"), get("i.AveExpr"), get("i.t"),
                get("i.P.Value"), get("i.adj.P.Val"), get("i.B"))]
    
    } else {
        res.ASE[res.inc, on = "EventName",
            paste("Inc", colnames(res.inc)[cols], sep=".") :=
            list(get("i.AveExpr"), get("i.F"),
                get("i.P.Value"), get("i.adj.P.Val"))]
        res.ASE[res.exc, on = "EventName",
            paste("Exc", colnames(res.exc)[cols], sep=".") :=
            list(get("i.AveExpr"), get("i.F"),
                get("i.P.Value"), get("i.adj.P.Val"))]    
    }
    setorderv(res.ASE, "B", order = -1)

    condlist <- as.list(sort(unique(
        unlist(colData(se)[, test_factor]
    ))))
    if(length(condlist) > 6) condlist <- condlist[seq_len(6)]
    res.ASE <- .ASE_add_diag_multi(res.ASE, se_use, test_factor, condlist)

    return(res.ASE)
}

#' @describeIn ASE-methods Use edgeR to perform differential time series
#'   of a filtered NxtSE object
#' @export
ASE_edgeR_timeseries <- function(se, test_factor,
        batch1 = "", batch2 = "",
        degrees_of_freedom = 1,
        useQL = TRUE,
        IRmode = c("all", "annotated", "annotated_binary"),
        filter_antiover = TRUE, filter_antinear = FALSE
) {

    .check_package_installed("edgeR", "3.32.0")
    .ASE_check_args(colData(se), test_factor, "", "",
        batch1, batch2, allowTimeSeries = TRUE)
    IRmode <- match.arg(IRmode)
    se_use <- .ASE_filter(
        se, filter_antiover, filter_antinear, IRmode)

    if(nrow(se_use) == 0)
        .log("No events for ASE analysis after filtering")

    .log("Performing edgeR contrast for included / excluded counts separately",
        "message")
    res.edgeR <- .ASE_edgeR_contrast_ts(se_use,
        test_factor,
        batch1, batch2, degrees_of_freedom, useQL)
    res.inc <- res.edgeR[grepl(".Included", get("EventName"))]
    res.inc[, c("EventName") :=
        sub(".Included","",get("EventName"), fixed=TRUE)]
    res.exc <- res.edgeR[grepl(".Excluded", get("EventName"))]
    res.exc[, c("EventName") :=
        sub(".Excluded","",get("EventName"), fixed=TRUE)]

    .log("Performing edgeR contrast for included / excluded counts together",
        "message")
    rowData <- as.data.frame(rowData(se_use))
    se_use <- se_use[rowData$EventName %in% res.inc$EventName &
        rowData$EventName %in% res.exc$EventName,]
    res.ASE <- .ASE_edgeR_contrast_ASE_ts(se_use,
        test_factor, # test_nom, test_denom,
        batch1, batch2, degrees_of_freedom, useQL)

    colnames(res.inc)[-ncol(res.inc)] <- paste(
        "Inc", colnames(res.inc)[-ncol(res.inc)], sep = ".")
    colnames(res.exc)[-ncol(res.exc)] <- paste(
        "Exc", colnames(res.exc)[-ncol(res.exc)], sep = ".")

    res.ASE <- res.ASE[res.inc, on = "EventName"]
    res.ASE <- res.ASE[res.exc, on = "EventName"]
    
    orderCol <- ifelse(useQL, "F", "LR")
    setorderv(res.ASE, orderCol, order = -1)
    
    condlist <- as.list(sort(unique(
        unlist(colData(se)[, test_factor]
    ))))
    if(length(condlist) > 6) condlist <- condlist[seq_len(6)]
    res.ASE <- .ASE_add_diag_multi(res.ASE, se_use, test_factor, condlist)

    return(res.ASE)
}


#' @describeIn ASE-methods Use DESeq2 to perform differential ASE analysis of
#'   a filtered NxtSE object
#' @export
ASE_DESeq <- function(se, test_factor, test_nom, test_denom,
        batch1 = "", batch2 = "",
        n_threads = 1,
        IRmode = c("all", "annotated", "annotated_binary"),
        filter_antiover = TRUE, filter_antinear = FALSE) {
    .check_package_installed("DESeq2", "1.30.0")
    .ASE_check_args(colData(se), test_factor,
        test_nom, test_denom, batch1, batch2,
        allowTimeSeries = TRUE)
    BPPARAM_mod <- .validate_threads(n_threads)
    
    IRmode <- match.arg(IRmode)
    se_use <- .ASE_filter(
        se, filter_antiover, filter_antinear, IRmode)
    if(nrow(se_use) == 0)
        .log("No events for ASE analysis after filtering")

    .log("Performing DESeq2 contrast for included / excluded counts separately",
        "message")
    res.IncExc <- .ASE_DESeq2_contrast(se_use,
        test_factor, test_nom, test_denom,
        batch1, batch2, BPPARAM_mod)
    res.inc <- res.IncExc[grepl(".Included", get("EventName"))]
    res.inc[, c("EventName") :=
        sub(".Included","",get("EventName"), fixed=TRUE)]
    res.exc <- res.IncExc[grepl(".Excluded", get("EventName"))]
    res.exc[, c("EventName") :=
        sub(".Excluded","",get("EventName"), fixed=TRUE)]

    .log("Performing DESeq2 contrast for included / excluded counts together",
        "message")
    rowData <- as.data.frame(rowData(se_use))
    se_use <- se_use[rowData$EventName %in% res.inc$EventName &
        rowData$EventName %in% res.exc$EventName,]
    res.ASE <- .ASE_DESeq2_contrast_ASE(se_use,
        test_factor, test_nom, test_denom,
        batch1, batch2, BPPARAM_mod)
    res.ASE[res.inc, on = "EventName",
        paste("Inc", colnames(res.inc)[seq_len(6)], sep=".") :=
        list(get("i.baseMean"), get("i.log2FoldChange"), get("i.lfcSE"),
            get("i.stat"), get("i.pvalue"), get("i.padj"))]
    res.ASE[res.exc, on = "EventName",
        paste("Exc", colnames(res.exc)[seq_len(6)], sep=".") :=
        list(get("i.baseMean"), get("i.log2FoldChange"), get("i.lfcSE"),
            get("i.stat"), get("i.pvalue"), get("i.padj"))]
    res.ASE <- res.ASE[!is.na(get("pvalue"))]
    setorder(res.ASE, "pvalue")
    if(is_valid(test_nom)) {
        res.ASE <- .ASE_add_diag(res.ASE, se_use, test_factor, 
            test_nom, test_denom)
    } else {
        condlist <- as.list(sort(unique(
            unlist(colData(se)[, test_factor]
        ))))
        if(length(condlist) > 6) condlist <- condlist[seq_len(6)]
        res.ASE <- .ASE_add_diag_multi(res.ASE, se_use, test_factor, condlist)
    }
    return(res.ASE)
}

#' @describeIn ASE-methods Use DoubleExpSeq to perform differential ASE analysis
#'   of a filtered NxtSE object (uses double exponential beta-binomial model)
#'   to estimate group dispersions, followed by LRT
#' @export
ASE_DoubleExpSeq <- function(se, test_factor, test_nom, test_denom,
        # batch1 = "", batch2 = "",
        IRmode = c("all", "annotated", "annotated_binary"),
        filter_antiover = TRUE, filter_antinear = FALSE) {

    .check_package_installed("DoubleExpSeq", "1.1")
    .ASE_check_args(colData(se), test_factor,
        test_nom, test_denom, "", "")
        
    IRmode <- match.arg(IRmode)
    se_use <- .ASE_filter(
        se, filter_antiover, filter_antinear, IRmode)
    if(nrow(se_use) == 0)
        .log("No events for ASE analysis after filtering")

    .log("Running DoubleExpSeq::DBGLM1() on given data", "message")
    res.ASE <- .ASE_DoubleExpSeq_contrast_ASE(se_use,
        test_factor, test_nom, test_denom)

    res.cols <- c(
        paste("MLE", test_nom, sep="_"), paste("MLE", test_denom, sep="_"),
        "P.Value", "adj.P.Val", "n_eff",
        paste("mDepth", test_nom, sep="_"),
        paste("mDepth", test_denom, sep="_"),
        "Dispersion_Reduced", "Dispersion_Full"
    )
    colnames(res.ASE)[-1] <- res.cols

    res.ASE[, c("MLE_LFC") := (
        qlogis(res.ASE[,get(paste("MLE", test_nom, sep="_"))]) -
        qlogis(res.ASE[,get(paste("MLE", test_denom, sep="_"))])
    ) / log(2)]

    res.ASE <- res.ASE[, c("EventName", res.cols[c(1,2)], "MLE_LFC",
        res.cols[seq(3,9)]), with = FALSE]

    res.ASE <- res.ASE[!is.na(get("P.Value"))]
    setorderv(res.ASE, "P.Value")
    res.ASE <- .ASE_add_diag(res.ASE, se_use, test_factor, 
        test_nom, test_denom)
    return(res.ASE)
}

############################## INTERNALS #######################################
# helper functions:

# Check arguments are valid
.ASE_check_args <- function(colData, test_factor,
        test_nom, test_denom, batch1, batch2, 
        allowTimeSeries = FALSE
) {
    if(!is_valid(test_factor)) {
        .log("test_factor must be defined")
    } else if (!allowTimeSeries & 
            (!is_valid(test_nom) | !is_valid(test_denom))) {
        .log("test_nom, test_denom must be defined")
    } else if(allowTimeSeries & !is_valid(test_nom)) {
        if(!is.numeric(colData[, test_factor])) {
            .log(paste(
                test_factor, "must be numeric for time series analysis"
            ))
        }
    }
    if(!(test_factor %in% colnames(colData))) {
        .log("test_factor is not a condition in colData")
    }
    if(is_valid(test_nom) && !any(colData[, test_factor] == test_nom)) {
        .log("test_nom is not found in any samples")
    }
    if(is_valid(test_denom) && !any(colData[, test_factor] == test_denom)) {
        .log("test_denom is not found in any samples")
    }
    if(batch1 != "") {
        if(!(batch1 %in% colnames(colData))) {
            .log("batch1 is not a condition in colData")
        }
        if(test_factor == batch1) {
            .log("batch1 and test_factor are the same")
        }
    }
    if(batch2 != "") {
        if(!(batch2 %in% colnames(colData))) {
            .log("batch2 is not a condition in colData")
        }
        if(test_factor == batch2) {
            .log("batch2 and test_factor are the same")
        }
    }
    if(batch1 != "" & batch2 != "") {
        if(batch1 == batch2) {
            .log("batch1 and batch2 are the same")
        }
    }
    return(TRUE)
}

# Filter antiover and antinear
.ASE_filter <- function(se, filter_antiover, filter_antinear, IRmode) {
    se_use <- se
    if(filter_antiover) {
        se_use <- se_use[
            !grepl("anti-over", rowData(se_use)$EventName),]
    }
    if(filter_antinear) {
        se_use <- se_use[
            !grepl("anti-near", rowData(se_use)$EventName),]
    }
    
    if(IRmode == "all") {
        se_use <- se_use[rowData(se_use)$EventType != "RI",]
    } else if(IRmode == "annotated") {
        se_use <- se_use[rowData(se_use)$EventType != "RI",]
        se_use <- se_use[
            rowData(se_use)$EventType != "IR" | 
            rowData(se_use)$is_annotated_IR,]
    } else {
        se_use <- se_use[rowData(se_use)$EventType != "IR",]    
    }

    return(se_use)
}

.ASE_limma_contrast <- function(se, test_factor, test_nom, test_denom,
        batch1, batch2) {
    in_data <- .ASE_contrast_expr(se, test_factor, 
        test_nom, test_denom,
        batch1, batch2)

    countData_use <- limma::voom(in_data$countData, in_data$design1)
    fit <- limma::lmFit(countData_use$E, design = in_data$design1)
    fit <- limma::contrasts.fit(fit, in_data$contrast)
    fit <- limma::eBayes(fit)

    res <- limma::topTable(fit, number = nrow(countData_use$E))
    res$EventName <- rownames(res)

    res$AveExpr <- res$AveExpr - min(res$AveExpr)
    res <- as.data.table(res)
    
    rm(fit, in_data, countData_use)
    gc()
    return(res)
}

.ASE_edgeR_contrast <- function(
        se, test_factor, test_nom, test_denom,
        batch1, batch2, useQL
) {
    in_data <- .ASE_contrast_expr(se, test_factor, 
        test_nom, test_denom,
        batch1, batch2)

    # countData_use <- limma::voom(in_data$countData, in_data$design1)
    y <- edgeR::DGEList(counts=in_data$countData, remove.zeros = FALSE)
    y <- edgeR::calcNormFactors(y)
    y <- edgeR::estimateDisp(y,in_data$design1)
    
    if(useQL) {
        fit <- edgeR::glmQLFit(y, in_data$design1, legacy = TRUE)
        qlf <- edgeR::glmQLFTest(fit, contrast = in_data$contrast)        
    } else {
        fit <- edgeR::glmFit(y, in_data$design1)
        qlf <- edgeR::glmLRT(fit, contrast = in_data$contrast)            
    }

    res <- edgeR::topTags(qlf, n = nrow(y))
    res$table$EventName <- rownames(res)
    
    rm(fit, qlf, in_data, y)
    gc()
    return(as.data.table(res$table))
}


.ASE_limma_contrast_ts <- function(se, test_factor, # test_nom, test_denom,
        batch1, batch2, degrees) {
    in_data <- .ASE_contrast_expr_ts(
        se, test_factor, 
        # test_nom, test_denom,
        batch1, batch2, degrees
    )

    countData_use <- limma::voom(in_data$countData, in_data$design1)
    fit <- limma::lmFit(countData_use$E, design = in_data$design1)
    # fit <- limma::contrasts.fit(fit, in_data$contrast)
    fit <- limma::eBayes(fit)

    res <- limma::topTable(fit, coef = in_data$coef,
        number = nrow(countData_use$E))
    res$EventName <- rownames(res)

    res$AveExpr <- res$AveExpr - min(res$AveExpr)
    res <- as.data.table(res)
    
    rm(fit, in_data, countData_use)
    gc()
    return(res)
}

.ASE_edgeR_contrast_ts <- function(se, test_factor, # test_nom, test_denom,
        batch1, batch2, degrees, useQL) {
    in_data <- .ASE_contrast_expr_ts(
        se, test_factor, 
        # test_nom, test_denom,
        batch1, batch2, degrees
    )

    y <- edgeR::DGEList(counts=in_data$countData, remove.zeros = FALSE)
    y <- edgeR::calcNormFactors(y)
    y <- edgeR::estimateDisp(y,in_data$design1)
    
    if(useQL) {
        fit <- edgeR::glmQLFit(y, in_data$design1, legacy = TRUE)
        qlf <- edgeR::glmQLFTest(fit, coef = in_data$coef)        
    } else {
        fit <- edgeR::glmFit(y, in_data$design1)
        qlf <- edgeR::glmLRT(fit, coef = in_data$coef)            
    }

    res <- edgeR::topTags(qlf, n = nrow(y))
    res$table$EventName <- rownames(res)
    
    rm(fit, qlf, in_data, y)
    gc()
    return(as.data.table(res$table))
}

.ASE_limma_contrast_ASE <- function(se, test_factor, test_nom, test_denom,
        batch1, batch2) {
    in_data <- .ASE_contrast_ASE(se, test_factor, 
        test_nom, test_denom,
        batch1, batch2)

    countData_use <- limma::voom(in_data$countData, in_data$design1, 
        lib.size = 1)

    fit <- limma::lmFit(countData_use$E, design = in_data$design1)

    fit <- limma::contrasts.fit(fit, in_data$contrast)
    fit <- limma::eBayes(fit)

    res <- limma::topTable(fit, number = nrow(countData_use$E))
    res$EventName <- rownames(res)
    res$AveExpr <- res$AveExpr - min(res$AveExpr)
    res <- as.data.table(res)
    
    rm(fit, in_data, countData_use)
    gc()
    return(res)
}

.ASE_edgeR_contrast_ASE <- function(se, test_factor, test_nom, test_denom,
        batch1, batch2, useQL) {
    in_data <- .ASE_contrast_ASE(se, test_factor, 
        test_nom, test_denom,
        batch1, batch2)

    y <- edgeR::DGEList(counts=in_data$countData)
    y <- edgeR::estimateDisp(y,in_data$design1)
    y$offset <- 1
    if(useQL) {
        fit <- edgeR::glmQLFit(y, in_data$design1, legacy = TRUE)
        qlf <- edgeR::glmQLFTest(fit, contrast = in_data$contrast)        
    } else {
        fit <- edgeR::glmFit(y, in_data$design1)
        qlf <- edgeR::glmLRT(fit, contrast = in_data$contrast)            
    }

    res <- edgeR::topTags(qlf, n = nrow(y))
    res$table$EventName <- rownames(res)
    
    rm(fit, qlf, in_data, y)
    gc()
    return(as.data.table(res$table))
}

.ASE_limma_contrast_ASE_ts <- function(se, test_factor, # test_nom, test_denom,
        batch1, batch2, degrees) {
    in_data <- .ASE_contrast_ASE_ts(se, test_factor, 
        # test_nom, test_denom,
        batch1, batch2, degrees)

    countData_use <- limma::voom(in_data$countData, in_data$design1, 
        lib.size = 1)

    fit <- limma::lmFit(countData_use$E, design = in_data$design1)

    fit <- limma::contrasts.fit(fit, in_data$contrast)
    fit <- limma::eBayes(fit)

    res <- limma::topTable(fit, number = nrow(countData_use$E))
    res$EventName <- rownames(res)
    res$AveExpr <- res$AveExpr - min(res$AveExpr)
    res <- as.data.table(res)
    
    rm(fit, in_data, countData_use)
    gc()
    return(res)
}

.ASE_edgeR_contrast_ASE_ts <- function(se, test_factor, # test_nom, test_denom,
        batch1, batch2, degrees, useQL) {
    in_data <- .ASE_contrast_ASE_ts(se, test_factor, 
        # test_nom, test_denom,
        batch1, batch2, degrees)

    y <- edgeR::DGEList(counts=in_data$countData)
    y <- edgeR::estimateDisp(y,in_data$design1)
    y$offset <- 1
    if(useQL) {
        fit <- edgeR::glmQLFit(y, in_data$design1, legacy = TRUE)
        qlf <- edgeR::glmQLFTest(fit, contrast = in_data$contrast)        
    } else {
        fit <- edgeR::glmFit(y, in_data$design1)
        qlf <- edgeR::glmLRT(fit, contrast = in_data$contrast)            
    }

    res <- edgeR::topTags(qlf, n = nrow(y))
    res$table$EventName <- rownames(res)
    
    rm(fit, qlf, in_data, y)
    gc()
    return(as.data.table(res$table))
}


.ASE_DESeq2_contrast <- function(se, test_factor, test_nom, test_denom,
        batch1, batch2, BPPARAM) {
    countData <- as.matrix(rbind(assay(se, "Included"),
        assay(se, "Excluded")))
    rowData <- as.data.frame(rowData(se))
    colData <- colData(se)
    rownames(colData) <- colnames(se)
    colnames(countData) <- rownames(colData)
    rownames(countData) <- c(
        paste(rowData$EventName, "Included", sep="."),
        paste(rowData$EventName, "Excluded", sep=".")
    )
    if(batch2 != "") {
        dds_formula <- paste0("~", paste(
            batch1, batch2, test_factor,
            sep="+"))
        dds_formula_reduced <- paste0("~", paste(
            batch1, batch2, sep="+"))
    } else if(batch1 != "") {
        dds_formula <- paste0("~", paste(
            batch1, test_factor,
            sep="+"))
        dds_formula_reduced <- paste0("~", paste(
            batch1, sep="+"))
    } else {
        dds_formula <- paste0("~", test_factor)
        dds_formula_reduced <- paste0("~1")
    }

    mode(countData) <- "integer"
    dds <- DESeq2::DESeqDataSetFromMatrix(
        countData = round(countData),
        colData = .DESeq_colData_sanitise(colData),
        design = as.formula(dds_formula)
    )
    message("ASE_DESeq: Profiling expression of Included and Excluded counts")

    if(!is_valid(test_nom) | !is_valid(test_denom)) {
        dds <- DESeq2::DESeq(dds, test = "LRT", 
            reduced = as.formula(dds_formula_reduced),
            parallel = TRUE, BPPARAM = BPPARAM)
        res <- as.data.frame(DESeq2::results(dds,
            parallel = TRUE, BPPARAM = BPPARAM)
        )        
    } else {
        dds <- DESeq2::DESeq(dds, parallel = TRUE, BPPARAM = BPPARAM)
        res <- as.data.frame(DESeq2::results(dds,
            contrast = c(test_factor, test_nom, test_denom),
            parallel = TRUE, BPPARAM = BPPARAM)
        )    
    }
    res$EventName <- rownames(res)

    rm(dds, countData)
    gc()
    return(as.data.table(res))
}

.ASE_DESeq2_contrast_ASE <- function(se, test_factor, test_nom, test_denom,
        batch1, batch2, BPPARAM) {
    countData <- as.matrix(cbind(assay(se, "Included"),
        assay(se, "Excluded")))
    rowData <- as.data.frame(rowData(se))
    colData <- as.data.frame(colData(se))
    colData <- rbind(colData, colData)
    rownames(colData) <- c(
        paste(colnames(se), "Included", sep="."),
        paste(colnames(se), "Excluded", sep=".")
    )
    colData$ASE <- rep(c("Included", "Excluded"), each = ncol(se))
    colnames(countData) <- rownames(colData)
    rownames(countData) <- rowData$EventName

    if(batch2 != "") {
        dds_formula <- paste0("~", paste(
            batch1, batch2, test_factor,
                paste0(test_factor, ":ASE"),
            sep="+"))
        dds_formula_reduced <- paste0("~", paste(
            batch1, batch2, test_factor,
                # paste0(test_factor, ":ASE"),
            sep="+"))
    } else if(batch1 != "") {
        dds_formula <- paste0("~", paste(
            batch1, test_factor,
            paste0(test_factor, ":ASE"),
            sep="+"))
        dds_formula_reduced <- paste0("~", paste(
            batch1, test_factor,
            # paste0(test_factor, ":ASE"),
            sep="+"))
    } else {
        dds_formula <- paste0("~", paste(
            test_factor,
            paste0(test_factor, ":ASE"),
            sep="+"))
        dds_formula_reduced <- paste0("~", paste(
            test_factor,
            # paste0(test_factor, ":ASE"),
            sep="+"))
    }

    mode(countData) <- "integer"
    dds <- DESeq2::DESeqDataSetFromMatrix(
        countData = countData,
        colData = .DESeq_colData_sanitise(colData),
        design = as.formula(dds_formula)
    )
    DESeq2::sizeFactors(dds) <- 1
    message("ASE_DESeq: Profiling differential ASE")
    if(!is_valid(test_nom) | !is_valid(test_denom)) {
        dds <- DESeq2::DESeq(dds, test = "LRT",
            reduced = as.formula(dds_formula_reduced),
            parallel = TRUE, BPPARAM = BPPARAM)
        res <- as.data.frame(DESeq2::results(dds,
            parallel = TRUE, BPPARAM = BPPARAM)
        )
    } else {
        dds <- DESeq2::DESeq(dds, parallel = TRUE, BPPARAM = BPPARAM)
        res <- as.data.frame(DESeq2::results(dds,
            list(
                paste0(test_factor, test_nom, ".ASEIncluded"),
                paste0(test_factor, test_denom, ".ASEIncluded")
            ),
            parallel = TRUE, BPPARAM = BPPARAM)
        )
    }
    res$EventName <- rownames(res)
    
    rm(dds, countData)
    gc()
    return(as.data.table(res))
}

.ASE_DoubleExpSeq_contrast_ASE <- function(se, test_factor,
    test_nom, test_denom) {

    # NB add pseudocounts
    pseudocount <- 1
    y <- as.matrix(assay(se, "Included") + pseudocount)
    m <- as.matrix(
        assay(se, "Included") + assay(se, "Excluded") + 2 * pseudocount
    )
    colData <- as.data.frame(colData(se))
    groups <- factor(colData[, test_factor])
    shrink.method <- "WEB"

    # carry over error from DoubleExpSeq version 1.1
    contrast.first <- which(unique(groups) == test_nom)
    contrast.second <- which(unique(groups) == test_denom)

    res <- DoubleExpSeq::DBGLM1(
        as.matrix(y), as.matrix(m), groups, shrink.method,
        contrast=c(contrast.first,contrast.second),
        fdr.level=0.05, use.all.groups=TRUE)

    rm(y, m)
    gc()
    return(cbind(data.table(EventName = rownames(res$All)),
        as.data.table(res$All)))
}

# Adds human-readable labels
.ASE_add_flags <- function(res) {
    res_IR <- res[get("EventType") == "IR"]
    res_nonIR <- res[get("EventType") != "IR"]

    flags_IR <- rep("", nrow(res_IR))
    flags_nonIR <- rep("", nrow(res_nonIR))
    
    if(length(flags_IR) > 0) {
        flags_IR[res_IR$NMD_direction == 1] <- paste0(
            flags_IR[res_IR$NMD_direction == 1], ";Inc-NMD")
        flags_IR[res_IR$NMD_direction == -1] <- paste0(
            flags_IR[res_IR$NMD_direction == -1], ";Exc-NMD")
    }    
    if(length(flags_nonIR) > 0) {
        flags_nonIR[res_nonIR$NMD_direction == 1] <- paste0(
            flags_nonIR[res_nonIR$NMD_direction == 1], ";Inc-NMD")
        flags_nonIR[res_nonIR$NMD_direction == -1] <- paste0(
            flags_nonIR[res_nonIR$NMD_direction == -1], ";Exc-NMD")
            
        isIncNovel <- grepl("novel", tstrsplit(res_nonIR$EventName, 
            split = ";", fixed = TRUE)[[1]])
        isExcNovel <- grepl("novel", tstrsplit(res_nonIR$EventName, 
            split = ";", fixed = TRUE)[[2]])
        flags_nonIR[isIncNovel] <- paste0(
            flags_nonIR[isIncNovel], ";Inc-novel")
        flags_nonIR[isExcNovel] <- paste0(
            flags_nonIR[isExcNovel], ";Exc-novel")
    }
    if(length(flags_IR) == 0) {
        flagsDT <- data.table(
            EventName = res_nonIR$EventName,
            flags = flags_nonIR
        )
    } else if(length(flags_nonIR) == 0) {
        flagsDT <- data.table(
            EventName = res_IR$EventName,
            flags = flags_IR
        )
    } else {
        flagsDT <- data.table(
            EventName = c(res_IR$EventName, res_nonIR$EventName),
            flags = c(flags_IR, flags_nonIR)
        )
    }
    
    res <- res[flagsDT, on = "EventName"]
    res[, c("flags") := substr(get("flags"), 2, nchar(get("flags")))]
    return(res[, c("EventName","EventType","EventRegion", "flags")])
}

.ASE_add_diag <- function(res, se, test_factor, test_nom, test_denom) {
    rowData <- as.data.frame(rowData(se))
    rowData.DT <- .ASE_add_flags(as.data.table(rowData[,
        c("EventName","EventType","EventRegion", "NMD_direction")]))
        
    diag <- makeMeanPSI(se, res$EventName,
        test_factor, list(test_nom, test_denom))
    colnames(diag)[2:3] <- c(paste0("AvgPSI_", test_nom),
        paste0("AvgPSI_", test_denom))
    diag$deltaPSI <- diag[, 2] - diag[, 3]
    diag$abs_deltaPSI <- abs(diag$deltaPSI)
    if(all(c("EventName","EventType","EventRegion", "flags") %in%
        colnames(res))) {
        res <- cbind(
            res[,c("EventName")],
            as.data.table(round(diag[,-1], 4)),
            res[,-c("EventName","EventType","EventRegion", "flags")]
        )            
    } else {
        res <- cbind(
            res[,c("EventName")],
            as.data.table(round(diag[,-1], 4)),
            res[,-c("EventName")]
        )
    }
    res <- rowData.DT[res, on = "EventName"]
    return(res)
}

.ASE_add_diag_multi <- function(
        res, se, test_factor, 
        conditionList
) {
    rowData <- as.data.frame(rowData(se))
    rowData.DT <- .ASE_add_flags(as.data.table(rowData[,
        c("EventName","EventType","EventRegion", "NMD_direction")]))
    diag <- makeMeanPSI(se, res$EventName,
        test_factor, conditionList)
    for(i in seq_len(length(conditionList))) {
        colnames(diag)[i+1] <- paste0("AvgPSI_", conditionList[i])
    }
    if(ncol(diag) == 3) {
        diag$deltaPSI <- diag[, 2] - diag[, 3]
        diag$abs_deltaPSI <- abs(diag$deltaPSI)
    }
    if(all(c("EventName","EventType","EventRegion", "flags") %in%
        colnames(res))) {
        res <- cbind(
            res[,c("EventName")],
            as.data.table(round(diag[,-1], 4)),
            res[,-c("EventName","EventType","EventRegion", "flags")]
        )            
    } else {
        res <- cbind(
            res[,c("EventName")],
            as.data.table(round(diag[,-1], 4)),
            res[,-c("EventName")]
        )
    }
    res <- rowData.DT[res, on = "EventName"]
    return(res)
}

.DESeq_colData_sanitise <- function(colData) {
    for(i in seq_len(ncol(colData))) {
        if(is(colData[,i], "character")) {
            colData[, i] <- factor(colData[, i])
        }
    }
    colData
}

################################################################################

.ASE_contrast_expr <- function(
        se, test_factor, 
        test_nom, test_denom,
        batch1, batch2
) {
    countData <- as.matrix(rbind(assay(se, "Included"),
        assay(se, "Excluded")))
    rowData <- as.data.frame(rowData(se))
    colData <- colData(se)
    rownames(colData) <- colnames(se)
    colnames(countData) <- rownames(colData)
    rownames(countData) <- c(
        paste(rowData$EventName, "Included", sep="."),
        paste(rowData$EventName, "Excluded", sep=".")
    )

    condition_factor <- factor(colData[, test_factor])
    if(batch2 != "") {
        batch2_factor <- colData[, batch2]
        batch1_factor <- colData[, batch1]
        design1 <- model.matrix(~1 + batch1_factor + batch2_factor +
            condition_factor)
    } else if(batch1 != "") {
        batch1_factor <- colData[, batch1]
        design1 <- model.matrix(~1 + batch1_factor + condition_factor)
    } else {
        design1 <- model.matrix(~1 + condition_factor)
    }
    contrast <- rep(0, ncol(design1))
    contrast_a <- paste0("condition_factor", test_nom)
    contrast_b <- paste0("condition_factor", test_denom)
    contrast[which(colnames(design1) == contrast_b)] <- -1
    contrast[which(colnames(design1) == contrast_a)] <- 1
    
    return(list(
        design1 = design1,
        countData = countData,
        contrast = contrast
    ))
}

.ASE_contrast_expr_ts <- function(
        se, test_factor, 
        # test_nom, test_denom,
        batch1, batch2,
        degrees = 3
) {
    countData <- as.matrix(rbind(assay(se, "Included"),
        assay(se, "Excluded")))
    rowData <- as.data.frame(rowData(se))
    colData <- colData(se)
    rownames(colData) <- colnames(se)
    colnames(countData) <- rownames(colData)
    rownames(countData) <- c(
        paste(rowData$EventName, "Included", sep="."),
        paste(rowData$EventName, "Excluded", sep=".")
    )
    if(degrees > length(unique(colData[, test_factor])) - 1)
        degrees <- length(unique(colData[, test_factor])) - 1
    if(degrees < 1) {
        .log(paste(
            "Not enough degrees of freedom in", test_factor, "variable.",
            "Perhaps just use a simple contrast between conditions?"
        ))
    }

    condition_factor <- splines::ns(colData[, test_factor], 
        df = degrees)
    if(batch2 != "") {
        batch2_factor <- colData[, batch2]
        batch1_factor <- colData[, batch1]
        design1 <- model.matrix(~1 + batch1_factor + batch2_factor +
            condition_factor)
    } else if(batch1 != "") {
        batch1_factor <- colData[, batch1]
        design1 <- model.matrix(~1 + batch1_factor + condition_factor)
    } else {
        design1 <- model.matrix(~1 + condition_factor)
    }
    coef <- seq(ncol(design1) - degrees + 1, ncol(design1))

    return(list(
        design1 = design1,
        countData = countData,
        coef = coef
    ))
}

.ASE_contrast_ASE <- function(
        se, test_factor, 
        test_nom, test_denom,
        batch1, batch2
) {
    countData <- as.matrix(cbind(assay(se, "Included"),
        assay(se, "Excluded")))

    rowData <- as.data.frame(rowData(se))
    colData <- as.data.frame(colData(se))
    colData <- rbind(colData, colData)
    rownames(colData) <- c(
        paste(colnames(se), "Included", sep="."),
        paste(colnames(se), "Excluded", sep=".")
    )
    colData$ASE <- rep(c("Included", "Excluded"), each = ncol(se))
    colnames(countData) <- rownames(colData)
    rownames(countData) <- rowData$EventName

    condition_factor <- factor(colData[, test_factor])
    ASE <- colData[, "ASE"]
    if(batch2 != "") {
        batch2_factor <- colData[, batch2]
        batch1_factor <- colData[, batch1]
        design1 <- model.matrix(~1 + batch1_factor + batch2_factor +
            condition_factor + condition_factor:ASE)
    } else if(batch1 != "") {
        batch1_factor <- colData[, batch1]
        design1 <- model.matrix(~1 + batch1_factor + condition_factor +
            condition_factor:ASE)
    } else {
        design1 <- model.matrix(~1 + condition_factor + condition_factor:ASE)
    }
    colnames(design1) <- sub(":",".",colnames(design1))
    contrast <- rep(0, ncol(design1))
    contrast_a <- paste0("condition_factor", test_nom, ".ASEIncluded")
    contrast_b <- paste0("condition_factor", test_denom, ".ASEIncluded")
    contrast[which(colnames(design1) == contrast_b)] <- -1
    contrast[which(colnames(design1) == contrast_a)] <- 1
    
    return(list(
        design1 = design1,
        countData = countData,
        contrast = contrast
    ))
}

.ASE_contrast_ASE_ts <- function(
        se, test_factor, 
        batch1, batch2,
        degrees = 3
) {
    countData <- as.matrix(cbind(assay(se, "Included"),
        assay(se, "Excluded")))

    rowData <- as.data.frame(rowData(se))
    colData <- as.data.frame(colData(se))
    colData <- rbind(colData, colData)
    rownames(colData) <- c(
        paste(colnames(se), "Included", sep="."),
        paste(colnames(se), "Excluded", sep=".")
    )
    colData$ASE <- rep(c("Included", "Excluded"), each = ncol(se))
    colnames(countData) <- rownames(colData)
    rownames(countData) <- rowData$EventName

    if(degrees > length(unique(colData[, test_factor])) - 1)
        degrees <- length(unique(colData[, test_factor])) - 1
    if(degrees < 1) {
        .log(paste(
            "Not enough degrees of freedom in", test_factor, "variable.",
            "Perhaps just use a simple contrast between conditions?"
        ))
    }
    condition_factor <- splines::ns(colData[, test_factor], 
        df = degrees)
    ASE <- colData[, "ASE"]
    if(batch2 != "") {
        batch2_factor <- colData[, batch2]
        batch1_factor <- colData[, batch1]
        design1 <- model.matrix(~1 + batch1_factor + batch2_factor + 
            ASE + ASE:condition_factor)
    } else if(batch1 != "") {
        batch1_factor <- colData[, batch1]
        design1 <- model.matrix(~1 + batch1_factor + 
            ASE + ASE:condition_factor)
    } else {
        design1 <- model.matrix(~1 + ASE + ASE:condition_factor)
    }
    colnames(design1) <- sub(":",".",colnames(design1))
    contrast <- rep(0, ncol(design1))
    contrast[which(grepl("ASEExcluded.", colnames(design1)))] <- -1
    contrast[which(grepl("ASEIncluded.", colnames(design1)))] <- 1
    
    return(list(
        design1 = design1,
        countData = countData,
        contrast = contrast
    ))
}
alexchwong/SpliceWiz documentation built on April 17, 2025, 5:15 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
alexchwong/SpliceWiz
interactive analysis and visualization of alternative splicing in R

R/ASE-methods.R
In alexchwong/SpliceWiz: interactive analysis and visualization of alternative splicing in R

Documented in ASE_DESeq ASE_DoubleExpSeq ASE_edgeR ASE_edgeR_timeseries ASE_limma ASE_limma_timeseries

R Package Documentation

Browse R Packages

We want your feedback!

alexchwong/SpliceWiz interactive analysis and visualization of alternative splicing in R

R/ASE-methods.R In alexchwong/SpliceWiz: interactive analysis and visualization of alternative splicing in R

Documented in ASE_DESeq ASE_DoubleExpSeq ASE_edgeR ASE_edgeR_timeseries ASE_limma ASE_limma_timeseries

R Package Documentation

Browse R Packages

We want your feedback!

alexchwong/SpliceWiz
interactive analysis and visualization of alternative splicing in R

R/ASE-methods.R
In alexchwong/SpliceWiz: interactive analysis and visualization of alternative splicing in R