runDiffSplice: Perform differential isoform analysis using diffSplice

runDiffSpliceR Documentation

Perform differential isoform analysis using diffSplice

Description

Perform differential isoform analysis using diffSplice

Usage

runDiffSplice(
  iMatrixTx,
  detectedTx = rownames(iMatrixTx),
  tx2geneDF,
  txColname = "transcript_id",
  geneColname = "gene_name",
  iDesign,
  iContrasts,
  cutoffFDR = 0.05,
  cutoffFold = 1.5,
  collapseByGene = TRUE,
  spliceTest = c("t", "simes", "F"),
  sep = " ",
  useVoom = TRUE,
  verbose = FALSE,
  ...
)

Arguments

iMatrixTx

numeric matrix of expression, with transcripts as rows and samples as columns. The data is assumed to be log2-transformed using the format log2(1 + x). This data should be normalized using appropriate methods, outside the scope of this function.

detectedTx

character vector containing all or a subset of rownames(iMatrix) used for statistical testing.

tx2geneDF

data.frame with colnames c(txColname, geneColname), where all entries of rownames(iMatrix) are represented in tx2geneDF[,txColname].

txColname, geneColname

the colnames(tx2geneDF) representing the rownames(iMatrixTx) matched by tx2geneDF[,txColname], and the associated genes given by tx2geneDF[,geneColname]. Note that detectedTx must also contain values in rownames(iMatrixTx) and tx2geneDF[,txColname].

iDesign

numeric matrix representing the design matrix for the experiment design. For example, limma::model.matrix(~0+group) will represent each group. Typically, rownames(iDesign) should be defined to match the colnames(iMatrix) even if it requires extra processing. The colnames(iDesign) should represent group names used in rownames(iContrasts).

iContrasts

numeric matrix representing the contrasts used in statistical comparisons. This matrix can be generated by running limma::makeContrasts() using a format similar to the following: limma::makeContrasts(contrasts="group1-group2", levels=iDesign), see "Examples" for more info.

cutoffFDR

numeric value indicating a statistical threshold on the FDR (adjusted P-value). Values should be between 0 and 1, where cutoffFDR=1 would impose no threshold on the adjusted P-value.

cutoffFold

numeric value indicating the minimum normal space fold change allowed for statistical hits. For example cutoffFold=2 would require a 2-fold change, equivalent to log2 fold change >= 1.

collapseByGene

logical indicating whether results should be summarized at the gene level after filtering statistical hits.

spliceTest

character value described in limma::topSplice() which defines the statistical test to return. The default "t" returns the t-test result for each isoform, which is mainly beneficial because it also includes fold change that can be filtered. The "F" returns F-test per gene, and "simes" returns the per-gene t-test P-value after Simes adjustment per gene.

sep

character value used as a delimiter in output data.frame colnames, such that each stats is followed by the contrast name, separated by this delimiter.

useVoom

logical indicating whether to apply the limma::voom() adjustment prior to running limma::diffSplice(). This value should be TRUE when analyzing count or pseudocount data.

verbose

logical indicating whether to print verbose output.

...

additional arguments are ignored.

Details

This function is intended to be a convenient method to call limma::diffSplice() and return the output of limma::topSplice() in helpful formats for downstream use.

The basic input required:

  • iMatrixTx a numeric matrix of transcript expression

  • detectedTx (optional) subset of rownames(iMatrixTx), sometimes determined by defineDetectedTx().

  • tx2geneDF data.frame with "transcript_id" and "gene_name" columns. This data.frame is often the same one used with tximport::tximport() when importing transcriptome data to the gene level.

  • iDesign,iContrasts design and contrast matrix, as created by groups2contrasts().

These steps are run in order:

  • limma:voom() (Optional.) This step is enabled with the argument useVoom=TRUE and should only be used when iMatrixTx contains count or pseudocount data. When using TPM or FPKM values, set useVoom=FALSE.

  • limma::lmFit()

  • limma::contrasts.fit()

  • limma::diffSplice()

  • limma::topSplice() This function is called on each contrast in order to return a list of data.frames.

Each contrast is tested for differential transcript expression. No other contrasts are tested.

The output is a list with an element "statsDFs" that is itself list of data.frames for each contrast. By default when collapseByGene=TRUE each row is collapsed to gene level, using the best statistical hit per gene as an exemplar.

The rows of iMatrixTx are expected to contain expression values per transcript isoform, but may contain alternative measurements such as: junction counts per gene; exon expression per gene.

When useVoom=TRUE, the iMatrixTx data is exponentiated prior to running limma::voom(), for the purpose of calculating a weights matrix. The original iMatrixTx data is used in limma::lmFit() as-is, alongside the voom weights. The voom-normalized data is not used. Therefore, the input data is assumed to be normalized.

Statistical results can be summarized at the gene level, after applying thresholds for statistical hits in the form of required adjusted P-value and/or fold change. It may be helpful to review results per gene, alongside the specific transcript isoforms which are called statistical hits.

Value

list containing:

  • detectedTx, detectedTxUse the detectedTx values representing genes with multiple transcripts;

  • fit the initial model fit;

  • fit2 the contrast model fit;

  • splice the output from limma::diffSplice();

  • statsDFs list of data.frame output from limma::topSplice() either at the transcript level or the gene level.

Note that when collapseByGene=TRUE the results will return the first transcript entry per gene that has the best P-value. Often one gene will have two transcripts with identical P-value, and the order that the transcripts appear is inconsistent. Therefore, the direction of fold change is not meaningful by itself, except with respect to the specific isoform returned. See limma::topSplice() argument test for more information about transcript- and gene-level summaries.

References

Law, CW, Chen, Y, Shi, W, and Smyth, GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29.

See Also

Other jam RNA-seq functions: assignGRLexonNames(), closestExonToJunctions(), combineGRcoverage(), defineDetectedTx(), detectedTxInfo(), exoncov2polygon(), flattenExonsBy(), getGRcoverageFromBw(), groups2contrasts(), internal_junc_score(), makeTx2geneFromGtf(), make_ref2compressed(), prepareSashimi(), sortSamples(), spliceGR2junctionDF()

Examples

# example for defining iDesign
# first define a vector of sample groups
iGroups <- jamba::nameVector(paste(rep(c("WT", "KO"), each=6),
   rep(c("Control", "Treated"), each=3),
   sep="_"));
iGroups <- factor(iGroups, levels=unique(iGroups));
iGroups;
# next define sample identifiers
iSamples <- names(iGroups);

# given a vector of groups, make iDesign
iDesign <- stats::model.matrix(~0+iGroups);

# It is good practice to rename colnames(iDesign) and rownames(iDesign),
# that is, for the love of all that is good, use colnames and rownames
# that help confirm that these matrices are consistent.
colnames(iDesign) <- levels(iGroups);
rownames(iDesign) <- names(iGroups);
iDesign;

# define contrasts
# the example below includes a two-way contrast, which is a test
# of the pairwise fold changes
iContrasts <- limma::makeContrasts(contrasts=c(
   "WT_Treated-WT_Control", "KO_Treated-KO_Control",
   "(KO_Treated-KO_Control)-(WT_Treated-WT_Control)"),
   levels=iDesign);
iContrasts;

# for validation, verify these constraints:
# - all(rownames(iDesign) == iSamples)
# - colnames(iDesign) == the actual group names
# - all(colnames(iDesign) == rownames(iContrasts))
# you can see which samples are included in each test with crossproduct:
iDesign %*% iContrasts;

## Another efficient way to define iDesign and iContrasts:
iDC <- splicejam::groups2contrasts(iGroups, returnDesign=TRUE);
iDesign <- iDC$iDesign;
iContrasts <- iDC$iContrasts;


jmw86069/jambio documentation built on April 21, 2024, 2:48 p.m.