Methods for the Gene Set Regulation Index (GSRI)
Description
Estimate the number of differentially expressed genes in gene sets.
Usage
1 2 3 
Arguments
exprs 
Matrix or object of class 
groups 
Factor with the assignments of the microarray samples to
the groups, along which the differential effect should be
estimated. Must have as many elements as 
geneSet 
Optional object of class 
names 
Optional character vector with the names of the gene
set(s). If missing the names are taken from the 
weight 
Optional numerical vector of weights specifying the
certainty a gene is part of the gene set. If 
nBoot 
Integer with the number of bootstrap samples to be drawn in the calculation of the GSRI (default: 100). 
test 
A function defining the statistical test used to assess
the differential effect between the groups which are given by the

testArgs 
List of optional arguments used by the 
alpha 
Single numeric specifying the confidence level for the GSRI. The estimated GSRI is the lower bound of the (1\sQuote{alpha})*100% confidence interval obtained from bootstrapping. 
grenander 
Logical about whether the modified Grenander estimator for the cumulative density should be used instead of a centered ECDF. By default the modified Grenander estimator is used. For more information, please see the ‘details’ section. 
verbose 
Logical indicating whether the progress of the
computation should be printed to the screen (default: FALSE). Most
useful if 
... 
Additional arguments, including:

Details
The gsri
method estimates the degree of differential expression in
gene sets. By assessing the part of the distribution of pvalues
consistent with the null hypothesis the number of differentially
expressed genes is calculated.
Through nonparametric fitting of the uniform component of the pvalue distribution, the fraction of regulated genes \sQuote{r} in a gene set is estimated. The GSRI \sQuote{eta} is then defined as the \sQuote{alpha*100}%quantile of the distribution of \sQuote{r}, obtained from bootstrapping the samples within the groups. The index indicates that with a probability of (1\sQuote{alpha})% more than a fraction of \sQuote{eta} genes in the gene set is differentially expressed. It can also be employed to test the hypothesis whether at least one gene in a gene set is regulated. Further, different gene sets can be compared or ranked according to the estimated amount of regulation.
Assessing the differential effect is based on pvalues obtained from
statistical testing at the level of individual genes between the
groups. The GSRI approach is independent of the underlying test and
can be chosen according to the experimental design. With the ttest
(rowt
) and Ftest (rowF
) two widely used statistical test are
already part of the package. Additional tests can easily used which
are passed with the test
argument to the gsri
method. For details
on how to implement custom test functions, please refer to the help of
rowt
and rowF
or the vignette of this package.
The GSRI approach further allows weighting the influence of individual genes in the estimation. This can be beneficial including for example the certainty that genes are part of a certain gene set derived from experimental findings or annotations.
Defining gene sets is available through the GSEABase package which
provides the GeneSet
and GeneSetCollection
classes a single or
multiple gene sets, respectively. This ensures a powerful approach for
obtaining gene sets from data objects, data bases, and other
bioconductor packages. For details on how to define or retrieve gene
sets, please refer to the documentation of the GSEABase package,
with a special focus on the GeneSet
and GeneSetCollection
classes.
The distribution of the pvalues of a gene set is assessed in the cumulative density. In addition to a symmetrical empirical cumulative density function (ECDF), the modified Grenander estimator based on the assumption about the concave shape of the cumulative density is implemented and used by default. While the modified Grenander estimator reduces the variance and makes the approach more stable especially for small gene set, it underestimates the number of regulated genes and thus leads to conservative estimates.
In the case that the computation is performed for several gene sets in
the form of a GeneSetCollection
object, it can be parallelized with the
multicore package. Please note that this package is not available
on all platforms. Using its capabilities requires attaching
multicore prior to the calculation and specification of the nCores
argument. For further details, please refer to the documentation of
the multicore package. This may be especially relevant in the case
that specific seed values for the bootstrapping are of interest.
Value
An object of class Gsri
with the slots:
result
:Data frame containing the results of the GSRI estimation, with one row for each gene set.
cdf
:List of data frames containing the ECDF of the pvalues. Each data frame covers one gene set.
parms
:List containing the parameter values used in the analysis, with the elements.
For details, please see the help for the Gsri
class.
Methods
Analysis for all genes of exprs
part of the gene set:

signature(exprs="matrix", groups="factor", geneSet="missing")

signature(exprs="ExpressionSet", groups="factor", geneSet="missing")
Analysis for one gene set, defined as an object of class GeneSet
:

signature(exprs="matrix", groups="factor", geneSet="GeneSet")

signature(exprs="ExpressionSet", groups="factor", geneSet="GeneSet")
Analysis for several gene sets, defined as an object of class
GeneSetCollection
:

signature(exprs="matrix", groups="factor", geneSet="GeneSetCollection")

signature(exprs="ExpressionSet", groups="factor", geneSet="GeneSetCollection")
In this case parallel computing capabilities provided by the multicore package may be available, depending on the platform.
Note
The standard deviation of the estimated number of regulated genes as
well as the GSRI are obtained through bootstrapping. Thus, the results
for these two parameters may differ slightly for several realizations,
especially for small numbers of bootstraps (nBoot
). Setting the seed
of the random number generator avoids this problem and yields exactly
the same results for several realizations.
Author(s)
Julian Gehring
Maintainer: Julian Gehring <julian.gehring@fdm.unifreiburg.de>
See Also
Package:
GSRIpackage
Class:
Gsri
Methods:
gsri
getGsri
getCdf
getParms
export
sortGsri
plot
show
summary
readCls
readGct
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  ## Simulate expression data for a gene set of
## 100 genes, 20 samples (10 treatment, 10 control)
## and 30 regulated genes
set.seed(1)
exprs < matrix(rnorm(100*20), 100)
exprs[1:30,1:10] < rnorm(30*10, mean=2)
rownames(exprs) < paste("g", 1:nrow(exprs), sep="")
groups < factor(rep(1:2, each=10))
## Estimate the number of differentially expressed genes
res < gsri(exprs, groups)
res
## Perform the analysis for different gene set
library(GSEABase)
gs1 < GeneSet(paste("g", 25:40, sep=""), setName="set1")
gs2 < GeneSet(paste("g", seq(1, nrow(exprs), by=5), sep=""), setName="set2")
gsc < GeneSetCollection(gs1, gs2)
res2 < gsri(exprs, groups, gs1)
res3 < gsri(exprs, groups, gsc, verbose=TRUE)
summary(res2)
