diffSpliceDGE: Test for Differential Exon Usage
In edgeR: Empirical Analysis of Digital Gene Expression Data in R

Description Usage Arguments Details Value Author(s) Examples

Given a negative binomial generalized log-linear model fit at the exon level, test for differential exon usage between experimental conditions.

1 2	diffSpliceDGE(glmfit, coef=ncol(glmfit$design), contrast=NULL, geneid, exonid=NULL, prior.count=0.125, verbose=TRUE)

`glmfit`	an `DGEGLM` fitted model object produced by `glmFit` or `glmQLFit`. Rows should correspond to exons.
`coef`	integer indicating which coefficient of the generalized linear model is to be tested for differential exon usage. Defaults to the last coefficient.
`contrast`	numeric vector specifying the contrast of the linear model coefficients to be tested for differential exon usage. Length must equal to the number of columns of `design`. If specified, then takes precedence over `coef`.
`geneid`	gene identifiers. Either a vector of length `nrow(glmfit)` or the name of the column of `glmfit$genes` containing the gene identifiers. Rows with the same ID are assumed to belong to the same gene.
`exonid`	exon identifiers. Either a vector of length `nrow(glmfit)` or the name of the column of `glmfit$genes` containing the exon identifiers.
`prior.count`	average prior count to be added to observation to shrink the estimated log-fold-changes towards zero.
`verbose`	logical, if `TRUE` some diagnostic information about the number of genes and exons is output.

This function tests for differential exon usage for each gene for a given coefficient of the generalized linear model.

Testing for differential exon usage is equivalent to testing whether the exons in each gene have the same log-fold-changes as the other exons in the same gene. At exon-level, the log-fold-change of each exon is compared to the log-fold-change of the entire gene which contains that exon. At gene-level, two different tests are provided. One is converting exon-level p-values to gene-level p-values by the Simes method. The other is using exon-level test statistics to conduct gene-level tests.

diffSpliceDGE produces an object of class DGELRT containing the component design from glmfit plus the following new components:

`comparison`	character string describing the coefficient being tested.
`coefficients`	numeric vector of coefficients on the natural log scale. Each coefficient is the difference between the log-fold-change for that exon versus the log-fold-change for the entire gene which contains that exon.
`genes`	data.frame of exon annotation.
`genecolname`	character string giving the name of the column of `genes` containing gene IDs.
`exoncolname`	character string giving the name of the column of `genes` containing exon IDs.
`exon.df.test`	numeric vector of testing degrees of freedom for exons.
`exon.p.value`	numeric vector of p-values for exons.
`gene.df.test`	numeric vector of testing degrees of freedom for genes.
`gene.p.value`	numeric vector of gene-level testing p-values.
`gene.Simes.p.value`	numeric vector of Simes' p-values for genes.
`gene.genes`	data.frame of gene annotation.

Some components of the output depend on whether glmfit is produced by glmFit or glmQLFit. If glmfit is produced by glmFit, then the following components are returned in the output object:

`exon.LR`	numeric vector of LR-statistics for exons.
`gene.LR`	numeric vector of LR-statistics for gene-level test.

If glmfit is produced by glmQLFit, then the following components are returned in the output object:

`exon.F`	numeric vector of F-statistics for exons.
`gene.df.prior`	numeric vector of prior degrees of freedom for genes.
`gene.df.residual`	numeric vector of residual degrees of freedom for genes.
`gene.F`	numeric vector of F-statistics for gene-level test.

The information and testing results for both exons and genes are sorted by geneid and by exonid within gene.

Yunshun Chen and Gordon Smyth

# Gene exon annotation
Gene <- paste("Gene", 1:100, sep="")
Gene <- rep(Gene, each=10)
Exon <- paste("Ex", 1:10, sep="")
Gene.Exon <- paste(Gene, Exon, sep=".")
genes <- data.frame(GeneID=Gene, Gene.Exon=Gene.Exon)

group <- factor(rep(1:2, each=3))
design <- model.matrix(~group)
mu <- matrix(100, nrow=1000, ncol=6)
# knock-out the first exon of Gene1 by 90%
mu[1,4:6] <- 10
# generate exon counts
counts <- matrix(rnbinom(6000,mu=mu,size=20),1000,6)

y <- DGEList(counts=counts, lib.size=rep(1e6,6), genes=genes)
gfit <- glmFit(y, design, dispersion=0.05)

ds <- diffSpliceDGE(gfit, geneid="GeneID")
topSpliceDGE(ds)
plotSpliceDGE(ds)

Loading required package: limma
Total number of exons:  1000 
Total number of genes:  100 
Number of genes with 1 exon:  0 
Mean number of exons in a gene:  10 
Max number of exons in a gene:  10 
    GeneID NExons      P.Value          FDR
10   Gene1     10 8.197684e-20 8.197684e-18
610 Gene61     10 9.092281e-03 3.162445e-01
140 Gene14     10 1.020144e-02 3.162445e-01
870 Gene87     10 1.264978e-02 3.162445e-01
200 Gene20     10 3.751885e-02 7.503770e-01
100 Gene10     10 5.840405e-02 9.722447e-01
410 Gene41     10 6.805713e-02 9.722447e-01
860 Gene86     10 9.179080e-02 9.932655e-01
220 Gene22     10 9.644389e-02 9.932655e-01
710 Gene71     10 1.030246e-01 9.932655e-01