runDiffSplice | R Documentation |
Perform differential isoform analysis using diffSplice
runDiffSplice(
iMatrixTx,
detectedTx = rownames(iMatrixTx),
tx2geneDF,
txColname = "transcript_id",
geneColname = "gene_name",
iDesign,
iContrasts,
cutoffFDR = 0.05,
cutoffFold = 1.5,
collapseByGene = TRUE,
spliceTest = c("t", "simes", "F"),
sep = " ",
useVoom = TRUE,
verbose = FALSE,
...
)
iMatrixTx |
numeric matrix of expression, with transcripts as
rows and samples as columns. The data is assumed to be log2-transformed
using the format |
detectedTx |
character vector containing all or a subset of
|
tx2geneDF |
data.frame with colnames |
txColname , geneColname |
the |
iDesign |
numeric matrix representing the design matrix for
the experiment design. For example, |
iContrasts |
numeric matrix representing the contrasts used
in statistical comparisons. This matrix can be generated by
running |
cutoffFDR |
numeric value indicating a statistical threshold
on the FDR (adjusted P-value). Values should be between 0 and 1, where
|
cutoffFold |
numeric value indicating the minimum normal space
fold change allowed for statistical hits. For example |
collapseByGene |
logical indicating whether results should be summarized at the gene level after filtering statistical hits. |
spliceTest |
character value described in |
sep |
character value used as a delimiter in output data.frame colnames, such that each stats is followed by the contrast name, separated by this delimiter. |
useVoom |
logical indicating whether to apply the |
verbose |
logical indicating whether to print verbose output. |
... |
additional arguments are ignored. |
This function is intended to be a convenient method
to call limma::diffSplice()
and return the output of
limma::topSplice()
in helpful formats for downstream
use.
The basic input required:
iMatrixTx
a numeric matrix of transcript expression
detectedTx
(optional) subset of rownames(iMatrixTx)
, sometimes
determined by defineDetectedTx()
.
tx2geneDF
data.frame with "transcript_id" and "gene_name" columns.
This data.frame is often the same one used with tximport::tximport()
when importing transcriptome data to the gene level.
iDesign
,iContrasts
design and contrast matrix, as created by
groups2contrasts()
.
These steps are run in order:
limma::voom()
(Optional.) This step is enabled with
the argument useVoom=TRUE
and should only be used when iMatrixTx
contains count or pseudocount data. When using TPM or FPKM values,
set useVoom=FALSE
.
limma::lmFit()
limma::contrasts.fit()
limma::diffSplice()
limma::topSplice()
This function is called on each contrast
in order to return a list of data.frames.
Each contrast is tested for differential transcript expression. No other contrasts are tested.
The output is a list with an element "statsDFs"
that is itself
list of data.frames for each contrast. By default when
collapseByGene=TRUE
each row is collapsed to gene level,
using the best statistical hit per gene as an exemplar.
The rows of iMatrixTx
are expected to contain expression values
per transcript isoform, but may contain alternative measurements
such as: junction counts per gene; exon expression per gene.
When useVoom=TRUE
, the iMatrixTx
data is exponentiated
prior to running limma::voom()
, for the purpose of
calculating a weights matrix.
The original iMatrixTx
data is used in limma::lmFit()
as-is,
alongside the voom weights. The voom-normalized data is not
used. Therefore, the input data is assumed to be normalized.
Statistical results can be summarized at the gene level, after applying thresholds for statistical hits in the form of required adjusted P-value and/or fold change. It may be helpful to review results per gene, alongside the specific transcript isoforms which are called statistical hits.
list containing:
detectedTx
, detectedTxUse
the detectedTx
values representing genes with multiple transcripts;
fit
the initial model fit;
fit2
the contrast model fit;
splice
the output from limma::diffSplice()
;
statsDFs
list of data.frame output from limma::topSplice()
either at the transcript level or the gene level.
Note that when collapseByGene=TRUE
the results will return the first
transcript entry per gene that has the best P-value. Often one gene
will have two transcripts with identical P-value, and the order
that the transcripts appear is inconsistent. Therefore, the direction
of fold change is not meaningful by itself, except with respect to
the specific isoform returned. See limma::topSplice()
argument
test
for more information about transcript- and gene-level
summaries.
Law, CW, Chen, Y, Shi, W, and Smyth, GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29.
Other jam RNA-seq functions:
assignGRLexonNames()
,
closestExonToJunctions()
,
combineGRcoverage()
,
defineDetectedTx()
,
detectedTxInfo()
,
exoncov2polygon()
,
flattenExonsBy()
,
getGRcoverageFromBw()
,
groups2contrasts()
,
internal_junc_score()
,
makeTx2geneFromGtf()
,
make_ref2compressed()
,
prepareSashimi()
,
sortSamples()
,
spliceGR2junctionDF()
# example for defining iDesign
# first define a vector of sample groups
iGroups <- jamba::nameVector(paste(rep(c("WT", "KO"), each=6),
rep(c("Control", "Treated"), each=3),
sep="_"));
iGroups <- factor(iGroups, levels=unique(iGroups));
iGroups;
# next define sample identifiers
iSamples <- names(iGroups);
# given a vector of groups, make iDesign
iDesign <- stats::model.matrix(~0+iGroups);
# It is good practice to rename colnames(iDesign) and rownames(iDesign),
# that is, for the love of all that is good, use colnames and rownames
# that help confirm that these matrices are consistent.
colnames(iDesign) <- levels(iGroups);
rownames(iDesign) <- names(iGroups);
iDesign;
# define contrasts
# the example below includes a two-way contrast, which is a test
# of the pairwise fold changes
iContrasts <- limma::makeContrasts(contrasts=c(
"WT_Treated-WT_Control", "KO_Treated-KO_Control",
"(KO_Treated-KO_Control)-(WT_Treated-WT_Control)"),
levels=iDesign);
iContrasts;
# for validation, verify these constraints:
# - all(rownames(iDesign) == iSamples)
# - colnames(iDesign) == the actual group names
# - all(colnames(iDesign) == rownames(iContrasts))
# you can see which samples are included in each test with crossproduct:
iDesign %*% iContrasts;
## Another efficient way to define iDesign and iContrasts:
iDC <- splicejam::groups2contrasts(iGroups, returnDesign=TRUE);
iDesign <- iDC$iDesign;
iContrasts <- iDC$iContrasts;
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.