runDiffSplice: Perform differential isoform analysis using diffSplice
In jmw86069/splicejam: Analysis and Visualization of Gene Splice Variants and Transcriptome Data

runDiffSplice

R Documentation

Perform differential isoform analysis using diffSplice

Description

Perform differential isoform analysis using diffSplice

Usage

runDiffSplice(
  iMatrixTx,
  detectedTx = rownames(iMatrixTx),
  tx2geneDF,
  txColname = "transcript_id",
  geneColname = "gene_name",
  iDesign,
  iContrasts,
  cutoffFDR = 0.05,
  cutoffFold = 1.5,
  collapseByGene = TRUE,
  spliceTest = c("t", "simes", "F"),
  sep = " ",
  useVoom = TRUE,
  verbose = FALSE,
  ...
)

Arguments

`iMatrixTx`	numeric matrix of expression, with transcripts as rows and samples as columns. The data is assumed to be log2-transformed using the format `log2(1 + x)`. This data should be normalized using appropriate methods, outside the scope of this function.
`detectedTx`	character vector containing all or a subset of `rownames(iMatrix)` used for statistical testing.
`tx2geneDF`	data.frame with colnames `c(txColname, geneColname)`, where all entries of `rownames(iMatrix)` are represented in `tx2geneDF[,txColname]`.
`txColname`, `geneColname`	the `colnames(tx2geneDF)` representing the `rownames(iMatrixTx)` matched by `tx2geneDF[,txColname]`, and the associated genes given by `tx2geneDF[,geneColname]`. Note that `detectedTx` must also contain values in `rownames(iMatrixTx)` and `tx2geneDF[,txColname]`.
`iDesign`	numeric matrix representing the design matrix for the experiment design. For example, `limma::model.matrix(~0+group)` will represent each group. Typically, `rownames(iDesign)` should be defined to match the `colnames(iMatrix)` even if it requires extra processing. The `colnames(iDesign)` should represent group names used in `rownames(iContrasts)`.
`iContrasts`	numeric matrix representing the contrasts used in statistical comparisons. This matrix can be generated by running `limma::makeContrasts()` using a format similar to the following: `limma::makeContrasts(contrasts="group1-group2", levels=iDesign)`, see "Examples" for more info.
`cutoffFDR`	numeric value indicating a statistical threshold on the FDR (adjusted P-value). Values should be between 0 and 1, where `cutoffFDR=1` would impose no threshold on the adjusted P-value.
`cutoffFold`	numeric value indicating the minimum normal space fold change allowed for statistical hits. For example `cutoffFold=2` would require a 2-fold change, equivalent to log2 fold change >= 1.
`collapseByGene`	logical indicating whether results should be summarized at the gene level after filtering statistical hits.
`spliceTest`	character value described in `limma::topSplice()` which defines the statistical test to return. The default `"t"` returns the t-test result for each isoform, which is mainly beneficial because it also includes fold change that can be filtered. The `"F"` returns F-test per gene, and `"simes"` returns the per-gene t-test P-value after Simes adjustment per gene.
`sep`	character value used as a delimiter in output data.frame colnames, such that each stats is followed by the contrast name, separated by this delimiter.
`useVoom`	logical indicating whether to apply the `limma::voom()` adjustment prior to running `limma::diffSplice()`. This value should be `TRUE` when analyzing count or pseudocount data.
`verbose`	logical indicating whether to print verbose output.
`...`	additional arguments are ignored.

Details

This function is intended to be a convenient method to call limma::diffSplice() and return the output of limma::topSplice() in helpful formats for downstream use.

The basic input required:

iMatrixTx a numeric matrix of transcript expression
detectedTx (optional) subset of rownames(iMatrixTx), sometimes determined by defineDetectedTx().
tx2geneDF data.frame with "transcript_id" and "gene_name" columns. This data.frame is often the same one used with tximport::tximport() when importing transcriptome data to the gene level.
iDesign,iContrasts design and contrast matrix, as created by groups2contrasts().

These steps are run in order:

limma::voom() (Optional.) This step is enabled with the argument useVoom=TRUE and should only be used when iMatrixTx contains count or pseudocount data. When using TPM or FPKM values, set useVoom=FALSE.
limma::lmFit()
limma::contrasts.fit()
limma::diffSplice()
limma::topSplice() This function is called on each contrast in order to return a list of data.frames.

Each contrast is tested for differential transcript expression. No other contrasts are tested.

The output is a list with an element "statsDFs" that is itself list of data.frames for each contrast. By default when collapseByGene=TRUE each row is collapsed to gene level, using the best statistical hit per gene as an exemplar.

The rows of iMatrixTx are expected to contain expression values per transcript isoform, but may contain alternative measurements such as: junction counts per gene; exon expression per gene.

When useVoom=TRUE, the iMatrixTx data is exponentiated prior to running limma::voom(), for the purpose of calculating a weights matrix. The original iMatrixTx data is used in limma::lmFit() as-is, alongside the voom weights. The voom-normalized data is not used. Therefore, the input data is assumed to be normalized.

Statistical results can be summarized at the gene level, after applying thresholds for statistical hits in the form of required adjusted P-value and/or fold change. It may be helpful to review results per gene, alongside the specific transcript isoforms which are called statistical hits.

Value

list containing:

detectedTx, detectedTxUse the detectedTx values representing genes with multiple transcripts;
fit the initial model fit;
fit2 the contrast model fit;
splice the output from limma::diffSplice();
statsDFs list of data.frame output from limma::topSplice() either at the transcript level or the gene level.

Note that when collapseByGene=TRUE the results will return the first transcript entry per gene that has the best P-value. Often one gene will have two transcripts with identical P-value, and the order that the transcripts appear is inconsistent. Therefore, the direction of fold change is not meaningful by itself, except with respect to the specific isoform returned. See limma::topSplice() argument test for more information about transcript- and gene-level summaries.

References

Law, CW, Chen, Y, Shi, W, and Smyth, GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29.

Examples

# example for defining iDesign
# first define a vector of sample groups
iGroups <- jamba::nameVector(paste(rep(c("WT", "KO"), each=6),
   rep(c("Control", "Treated"), each=3),
   sep="_"));
iGroups <- factor(iGroups, levels=unique(iGroups));
iGroups;
# next define sample identifiers
iSamples <- names(iGroups);

# given a vector of groups, make iDesign
iDesign <- stats::model.matrix(~0+iGroups);

# It is good practice to rename colnames(iDesign) and rownames(iDesign),
# that is, for the love of all that is good, use colnames and rownames
# that help confirm that these matrices are consistent.
colnames(iDesign) <- levels(iGroups);
rownames(iDesign) <- names(iGroups);
iDesign;

# define contrasts
# the example below includes a two-way contrast, which is a test
# of the pairwise fold changes
iContrasts <- limma::makeContrasts(contrasts=c(
   "WT_Treated-WT_Control", "KO_Treated-KO_Control",
   "(KO_Treated-KO_Control)-(WT_Treated-WT_Control)"),
   levels=iDesign);
iContrasts;

# for validation, verify these constraints:
# - all(rownames(iDesign) == iSamples)
# - colnames(iDesign) == the actual group names
# - all(colnames(iDesign) == rownames(iContrasts))
# you can see which samples are included in each test with crossproduct:
iDesign %*% iContrasts;

## Another efficient way to define iDesign and iContrasts:
iDC <- splicejam::groups2contrasts(iGroups, returnDesign=TRUE);
iDesign <- iDC$iDesign;
iContrasts <- iDC$iContrasts;

jmw86069/splicejam documentation built on April 14, 2025, 3:12 a.m.