ASE-methods: Use Limma, DESeq2 or DoubleExpSeq to test for differential...

ASE-methodsR Documentation

Use Limma, DESeq2 or DoubleExpSeq to test for differential Alternative Splice Events

Description

Use Limma, DESeq2 or DoubleExpSeq to test for differential Alternative Splice Events

Usage

limma_ASE(
  se,
  test_factor,
  test_nom,
  test_denom,
  batch1 = "",
  batch2 = "",
  filter_antiover = TRUE,
  filter_antinear = FALSE
)

DESeq_ASE(
  se,
  test_factor,
  test_nom,
  test_denom,
  batch1 = "",
  batch2 = "",
  n_threads = 1,
  filter_antiover = TRUE,
  filter_antinear = FALSE
)

DoubleExpSeq_ASE(
  se,
  test_factor,
  test_nom,
  test_denom,
  filter_antiover = TRUE,
  filter_antinear = FALSE
)

Arguments

se

The NxtSE object created by MakeSE(). To reduce runtime and avoid excessive multiple testing, consider filtering the object using apply_filters

test_factor

The condition type which contains the contrasting variable

test_nom

The nominator condition to test for differential ASE. Usually the "treatment" condition

test_denom

The denominator condition to test against for differential ASE. Usually the "control" condition

batch1, batch2

(Optional, limma and DESeq2 only) One or two condition types containing batch information to account for.

filter_antiover, filter_antinear

Whether to remove novel IR events that overlap over or near anti-sense genes. Default will exclude antiover but not antinear introns. These are ignored if stranded RNA-seq protocols are used.

n_threads

(DESeq2 only) How many threads to use for DESeq2 based analysis.

Details

Using limma, NxtIRF models included and excluded counts as log-normal distributed, whereas using DESeq2, NxtIRF models included and excluded counts as negative binomial distributed with dispersion shrinkage according to their mean count expressions. For limma and DESeq2, differential ASE are considered as the "interaction" between included and excluded splice counts for each sample. See this vignette for an explanation of how this is done.

Using DoubleExpSeq, included and excluded counts are modelled using the generalized beta prime distribution, using empirical Bayes shrinkage to estimate dispersion.

EventType are as follow:

  • IR = (novel) intron retention

  • MXE = mutually exclusive exons

  • SE = skipped exons

  • AFE = alternate first exon

  • ALE = alternate last exon

  • A5SS = alternate 5'-splice site

  • A3SS = alternate 3'-splice site

  • RI = (known / annotated) intron retention.

NB: NxtIRF separately considers known "RI" and novel "IR" events separately:

  • IR novel events are calculated using the IRFinder method, whereby spliced transcripts are all isoforms that do not retain the intron, as estimated via the SpliceMax and SpliceOverMax methods

  • see CollateData.

  • RI known retained introns are those that lie completely within a single exon of another transcript. (NB: in NxtIRFcore v1.1.1 and later, this encompasses exons from any transcript, including retained_intron and sense_intronic transcripts). RI's are calculated by considering the specific spliced intron as a binary event paired with its retention. The spliced abundance is calculated exclusively by splice reads mapped to the specific intron boundaries. Known retained introns are those where the intron retaining transcript is an annotated transcript. In NxtIRFcore version < 1.1.1, the IR-transcript's transcript_biotype must not be an retained_intron or sense_intronic.

NxtIRF considers "included" counts as those that represent abundance of the "included" isoform, whereas "excluded" counts represent the abundance of the "excluded" isoform. For consistency, it applies a convention whereby the "included" transcript is one where its splice junctions are by definition shorter than those of "excluded" transcripts. Specifically, this means the included / excluded isoforms are as follows:

EventType Included Excluded
IR or RI Intron Retention Spliced Intron
MXE Upstream exon inclusion Downstream exon inclusion
SE Exon inclusion Exon skipping
AFE Downstream exon usage Upstream exon usage
ALE Upstream exon usage Downstream exon usage
A5SS Downstream 5'-SS Upstream 5'-SS
A3SS Upstream 3'-SS Downstream 3'-SS

Value

A data table containing the following:

  • EventName: The name of the ASE event. This identifies each ASE in downstream functions including make_diagonal, make_matrix, and Plot_Coverage

  • EventType: The type of event. See details section above.

  • EventRegion: The genomic coordinates the event occupies. This spans the most upstream and most downstream splice junction involved in the ASE, and is use to guide the Plot_Coverage function.

  • NMD_direction: Indicates whether one isoform is a NMD substrate. +1 means included isoform is NMD, -1 means the excluded isoform is NMD, and 0 means there is no change in NMD status (i.e. both / neither are NMD)

  • AvgPSI_nom, Avg_PSI_denom: the average percent spliced in / percent IR levels for the two conditions being contrasted. nom and denom in column names are replaced with the condition names

limma specific output

  • logFC, AveExpr, t, P.Value, adj.P.Val, B: limma topTable columns of differential ASE. See limma::topTable for details.

  • inc/exc_(logFC, AveExpr, t, P.Value, adj.P.Val, B): limma results for differential testing for raw included / excluded counts only

DESeq2 specific output

  • baseMean, log2FoldChange, lfcSE, stat, pvalue, padj: DESeq2 results columns for differential ASE; see DESeq2::results for details.

  • inc/exc_(baseMean, log2FoldChange, lfcSE, stat, pvalue, padj): DESeq2 results for differential testing for raw included / excluded counts only

DoubleExp specific output

  • MLE_nom, MLE_denom: Expectation values for the two groups. nom and denom in column names are replaced with the condition names

  • MLE_LFC: Log2-fold change of the MLE

  • P.Value, adj.P.Val: Nominal and BH-adjusted P values

  • n_eff: Number of effective samples (i.e. non-zero or non-unity PSI)

  • mDepth: Mean Depth of splice coverage in each of the two groups.

  • Dispersion_Reduced, Dispersion_Full: Dispersion values for reduced and full models. See DoubleExpSeq::DBGLM1 for details.

Functions

  • limma_ASE: Use limma to perform differential ASE analysis of a filtered NxtSE object

  • DESeq_ASE: Use DESeq2 to perform differential ASE analysis of a filtered NxtSE object

  • DoubleExpSeq_ASE: Use DoubleExpSeq to perform differential ASE analysis of a filtered NxtSE object (uses double exponential beta-binomial model) to estimate group dispersions, followed by LRT

References

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). 'limma powers differential expression analyses for RNA-sequencing and microarray studies.' Nucleic Acids Research, 43(7), e47. https://doi.org/10.1093/nar/gkv007

Love MI, Huber W, Anders S (2014). 'Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.' Genome Biology, 15, 550. https://doi.org/10.1186/s13059-014-0550-8

Ruddy S, Johnson M, Purdom E (2016). 'Shrinkage of dispersion parameters in the binomial family, with application to differential exon skipping.' Ann. Appl. Stat. 10(2): 690-725. https://doi.org/10.1214/15-AOAS871

Examples

# see ?MakeSE on example code of generating this NxtSE object
se <- NxtIRF_example_NxtSE()

colData(se)$treatment <- rep(c("A", "B"), each = 3)

require("limma")
res_limma <- limma_ASE(se, "treatment", "A", "B")

require("DoubleExpSeq")
res_DES <- DoubleExpSeq_ASE(se, "treatment", "A", "B")
## Not run: 

require("DESeq2")
res_DESeq <- DESeq_ASE(se, "treatment", "A", "B")

## End(Not run)

alexchwong/NxtIRFcore documentation built on Oct. 31, 2022, 9:14 a.m.