gage: GAGE (Generally Applicable Gene-set Enrichment) analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/gage.R

Description

Run GAGE analysis to infer gene sets (or pathways, functional groups etc) that are signficantly perturbed relative to all genes considered. GAGE is generally applicable to essentially all microarray dta independent of data attributes including sample size, experimental layout, study design, and all types of heterogeneity in data generation.

gage is the main function; gagePrep is the functions for the initial data preparation, especially sample pairing; gageSum carries out the final meta-test summarization.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
gage(exprs, gsets, ref = NULL, samp = NULL, set.size = c(10, 500),
same.dir = TRUE, compare = "paired", rank.test = FALSE, use.fold = TRUE,
FDR.adj = TRUE, weights = NULL, full.table = FALSE, saaPrep = gagePrep,
saaTest = gs.tTest, saaSum = gageSum, use.stouffer=TRUE, ...)

gagePrep(exprs, ref = NULL, samp = NULL, same.dir = TRUE, compare =
"paired", rank.test = FALSE, use.fold = TRUE, weights = NULL, full.table =
FALSE, ...)

gageSum(rawRes, ref = NULL, test4up = TRUE, same.dir =
TRUE, compare = "paired", use.fold = TRUE, weights = NULL, full.table =
FALSE, use.stouffer=TRUE, ...)

Arguments

exprs

an expression matrix or matrix-like data structure, with genes as rows and samples as columns.

gsets

a named list, each element contains a gene set that is a character vector of gene IDs or symbols. For example, type head(kegg.gs). A gene set can also be a "smc" object defined in PGSEA package. Please make sure that the same gene ID system is used for both gsets and exprs.

ref

a numeric vector of column numbers for the reference condition or phenotype (i.e. the control group) in the exprs data matrix. Default ref = NULL, all columns are considered as target experiments.

samp

a numeric vector of column numbers for the target condition or phenotype (i.e. the experiment group) in the exprs data matrix. Default samp = NULL, all columns other than ref are considered as target experiments.

set.size

gene set size (number of genes) range to be considered for enrichment test. Tests for too small or too big gene sets are not robust statistically or informative biologically. Default to be set.size = c(10, 500).

same.dir

boolean, whether to test for changes in a gene set toward a single direction (all genes up or down regulated) or changes towards both directions simultaneously. For experimentally derived gene sets, GO term groups, etc, coregulation is commonly the case, hence same.dir = TRUE (default); In KEGG, BioCarta pathways, genes frequently are not coregulated, hence it could be informative to let same.dir = FALSE. Although same.dir = TRUE could also be interesting for pathways.

compare

character, which comparison scheme to be used: 'paired', 'unpaired', '1ongroup', 'as.group'. 'paired' is the default, ref and samp are of equal length and one-on-one paired by the original experimental design; 'as.group', group-on-group comparison between ref and samp; 'unpaired' (used to be '1on1'), one-on-one comparison between all possible ref and samp combinations, although the original experimental design may not be one-on-one paired; '1ongroup', comparison between one samp column at a time vs the average of all ref columns.

For PAGE-like analysis, the default is compare='as.group', which is the only option provided in the original PAGE method. All other comparison schemas are set here for direct comparison to gage.

rank.test

rank.test: Boolean, whether do the optional rank based two-sample t-test (equivalent to the non-parametric Wilcoxon Mann-Whitney test) instead of parametric two-sample t-test. Default rank.test = FALSE. This argument should be used with respect to argument saaTest.

use.fold

Boolean, whether to use fold changes or t-test statistics as per gene statistics. Default use.fold= TRUE.

FDR.adj

Boolean, whether to do adjust for multiple testing as to control FDR (False dicovery rate). Default to be TRUE.

weights

a numeric vector to specify the weights assigned to pairs of ref-samp. This is needed for data with both technical replicates and biological replicates as to count for the different contributions from the two types of replicates. This argument is also useful in manually paring ref-samp for unpaired data, as in pairData function. function. Default to be NULL.

full.table

This option is obsolete. Boolean, whether to output the full table of all individual p-values from the pairwise comparisons of ref and samp. Default to be FALSE.

saaPrep

function used for data preparation for single array based analysis, including sanity check, sample pairing, per gene statistics calculation etc. Default to be gagePrep, i.e. the default data preparation routine for gage analysis.

saaTest

function used for gene set tests for single array based analysis. Default to be gs.tTest, which features a two-sample t-test for differential expression of gene sets. Other options includes: gs.zTest, using one-sample z-test as in PAGE, or gs.KSTest, using the non-parametric Kolmogorov-Smirnov tests as in GSEA. The two non-default options should only be used when rank.test = FALSE.

saaSum

function used for summarization of the results from single array analysis (i.e. pairwise comparison between ref and samp). This function should include a meta-test for a global p-value or summary statistis and a FDR adjustment for multi-testing issue. Default to be gageSum, i.e. the default data summarization routine for gage analysis.

rawRes

a named list, the raw results of gene set tests. Check the help information of gene set test functions like gs.tTest for details.

test4up

boolean, whether to summarize the p-value results for up-regulation test (p.results) or not (ps.results for down-regulation). This argument is only needed when the argument same.dir=TRUE in the main gage function, i.e. when test for one-directional changes.

use.stouffer

Boolean, whether to use Stouffer's method when summarizing individual p-values into a global p-value. Default to be TRUE. This default setting is recommended as to avoid the "dual significance", i.e. a gene set is significant for both up-regulation and down-regulation tests simultaneously. Dual signficance occurs sometimes for data with large sample size, due to extremely small p-values in a few pair-wise comparison. More robust p-value summarization using Stouffer's method is a important new feature added to GAGE since version 2.2.0 (Bioconductor 2.8). This new argument is set as to provide a option to the original summarization based on Gamma distribution (FALSE).

...

other arguments to be passed into the optional functions for saaPrep, saaTest and saaSum.

Details

We proposed a single array analysis (i.e. the one-on-one comparison) approach with GAGE. Here we made single array analysis a general workflow for gene set analysis. Single array analysis has 4 major steps: Step 1 sample pairing, Step 2 per gene tests, Step 3 gene set tests and Step 4 meta-test summarization. Correspondingly, this new main function, gage, is divided into 3 relatively independent modules. Module 1 input preparation covers step 1-2 of single array analysis. Module 2 corresponds to step 3 gene set test, and module 3 to step 4 meta-test summarization. These 3 modules become 3 argument functions to gage, saaPrep, saaTest and saaSum. The modulization made gage open to customization or plug-in routines at each steps and fully realize the general applicability of single array analysis. More examples will be included in a second vignette to demonstrate the customization with these modules.

some important updates has been made to gage package since version 2.2.0 (Bioconductor 2.8): First, more robust p-value summarization using Stouffer's method through argument use.stouffer=TRUE. The original p-value summarization, i.e. negative log sum following a Gamma distribution as the Null hypothesis, may produce less stable global p-values for large or heterogenous datasets. In other words, the global p-value could be heavily affected by a small subset of extremely small individual p-values from pair-wise comparisons. Such sensitive global p-value leads to the "dual signficance" phenomenon. Dual-signficant means a gene set is called significant simultaneously in both 1-direction tests (up- and down-regulated). "Dual signficance" could be informative in revealing the sub-types or sub-classes in big clinical or disease studies, but may not be desirable in other cases. Second, output of gage function now includes the gene set test statistics from pair-wise comparisons for all proper gene sets. The output is always a named list now, with either 3 elements ("greater", "less", "stats") for one-directional test or 2 elements ("greater", "stats") for two-directional test. Third, the individual p-value (and test statistics)from dependent pair-wise comparisions, i.e. comparisions between the same experiment vs different controls, are now summarized into a single value. In other words, the column number of individual p-values or statistics is always the same as the sample number in the experiment (or disease) group. This change made the argument value compare="1ongroup" and argument full.table less useful. It also became easier to check the perturbations at gene-set level for individual samples. Fourth, whole gene-set level changes (either p-values or statistics) can now be visualized using heatmaps due to the third change above. Correspondingly, functions sigGeneSet and gagePipe have been revised to plot heatmaps for whole gene sets.

Value

The result returned by gage function is a named list, with either 3 elements ("greater", "less", "stats") for one-directional test (same.dir = TRUE) or 2 elements ("greater", "stats") for two-directional test (same.dir = FALSE). Elements "greater" and "less" are two data matrices of the same structure, mainly the p-values, element "stats" contains the test statistics. Each data matrix here has gene sets as rows sorted by global p- or q-values. Test signficance or statistics columns include:

p.geomean

geometric mean of the individual p-values from multiple single array based gene set tests

stat.mean

mean of the individual statistics from multiple single array based gene set tests. Normally, its absoluate value measures the magnitude of gene-set level changes, and its sign indicates direction of the changes. When saaTest=gs.KSTest, stat.mean is always positive.

p.val

gloal p-value or summary of the individual p-values from multiple single array based gene set tests. This is the default p-value being used.

q.val

FDR q-value adjustment of the global p-value using the Benjamini & Hochberg procedure implemented in multtest package. This is the default q-value being used.

set.size

the effective gene set size, i.e. the number of genes included in the gene set test

other columns

columns of the individual p-values or statistics, each measures the gene set perturbation in a single experiment (vs its control or all controls, depends on the "compare argument value)

The result returned by gagePrep is a data matrix derived from exprs, but ready for column-wise gene est tests. In the matrix, genes are rows, and columns are the per gene test statistics from the ref-samp pairwise comparison.

The result returned by gageSum is almost identical to the results of gage function, it is also a named list but has only 2 elements, "p.glob" and "results", with one round of test results.

Author(s)

Weijun Luo <luo_weijun@yahoo.com>

References

Luo, W., Friedman, M., Shedden K., Hankenson, K. and Woolf, P GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161

See Also

gs.tTest, gs.zTest, and gs.KSTest functions used for gene set test; gagePipe and heter.gage function used for multiple GAGE analysis in a batch or combined GAGE analysis on heterogeneous data

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
data(gse16873)
cn=colnames(gse16873)
hn=grep('HN',cn, ignore.case =TRUE)
dcis=grep('DCIS',cn, ignore.case =TRUE)
data(kegg.gs)
data(go.gs)

#kegg test for 1-directional changes
gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs, 
    ref = hn, samp = dcis)
#go.gs with the first 1000 entries as a fast example.
gse16873.go.p <- gage(gse16873, gsets = go.gs, 
    ref = hn, samp = dcis)
str(gse16873.kegg.p)
head(gse16873.kegg.p$greater)
head(gse16873.kegg.p$less)
head(gse16873.kegg.p$stats)
#kegg test for 2-directional changes
gse16873.kegg.2d.p <- gage(gse16873, gsets = kegg.gs, 
    ref = hn, samp = dcis, same.dir = FALSE)
head(gse16873.kegg.2d.p$greater)
head(gse16873.kegg.2d.p$stats)

###alternative ways to do GAGE analysis###
#with unpaired samples
gse16873.kegg.unpaired.p <- gage(gse16873, gsets = kegg.gs, 
    ref = hn, samp = dcis, compare = "unpaired")

#other options to tweak includes:
#saaTest, use.fold, rank.test, etc. Check arguments section above for
#details and the vignette for more examples.

Example output

List of 3
 $ greater: num [1:177, 1:11] 0.000216 0.001497 0.004771 0.003718 0.01862 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:177] "hsa04141 Protein processing in endoplasmic reticulum" "hsa00190 Oxidative phosphorylation" "hsa03050 Proteasome" "hsa04142 Lysosome" ...
  .. ..$ : chr [1:11] "p.geomean" "stat.mean" "p.val" "q.val" ...
 $ less   : num [1:177, 1:11] 0.000798 0.005628 0.02764 0.069383 0.089991 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:177] "hsa03010 Ribosome" "hsa04510 Focal adhesion" "hsa04270 Vascular smooth muscle contraction" "hsa04020 Calcium signaling pathway" ...
  .. ..$ : chr [1:11] "p.geomean" "stat.mean" "p.val" "q.val" ...
 $ stats  : num [1:177, 1:7] 3.52 2.85 2.63 2.54 2.12 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:177] "hsa04141 Protein processing in endoplasmic reticulum" "hsa00190 Oxidative phosphorylation" "hsa03050 Proteasome" "hsa04142 Lysosome" ...
  .. ..$ : chr [1:7] "stat.mean" "DCIS_1" "DCIS_2" "DCIS_3" ...
                                                        p.geomean stat.mean
hsa04141 Protein processing in endoplasmic reticulum 0.0002164597  3.517131
hsa00190 Oxidative phosphorylation                   0.0014970021  2.848815
hsa03050 Proteasome                                  0.0047708296  2.631099
hsa04142 Lysosome                                    0.0037176143  2.541611
hsa03060 Protein export                              0.0186201404  2.118436
hsa04145 Phagosome                                   0.0145567304  1.996678
                                                            p.val        q.val
hsa04141 Protein processing in endoplasmic reticulum 9.236559e-18 1.477850e-15
hsa00190 Oxidative phosphorylation                   3.279350e-12 2.623480e-10
hsa03050 Proteasome                                  2.108534e-10 1.124551e-08
hsa04142 Lysosome                                    4.027638e-10 1.611055e-08
hsa03060 Protein export                              4.404710e-07 1.409507e-05
hsa04145 Phagosome                                   6.258738e-07 1.668997e-05
                                                     set.size       DCIS_1
hsa04141 Protein processing in endoplasmic reticulum      144 1.909626e-05
hsa00190 Oxidative phosphorylation                         97 3.475070e-05
hsa03050 Proteasome                                        39 8.947511e-04
hsa04142 Lysosome                                         108 9.028336e-03
hsa03060 Protein export                                    18 9.442471e-04
hsa04145 Phagosome                                        132 7.176237e-02
                                                           DCIS_2       DCIS_3
hsa04141 Protein processing in endoplasmic reticulum 4.500862e-06 4.338235e-03
hsa00190 Oxidative phosphorylation                   2.149291e-04 2.665382e-01
hsa03050 Proteasome                                  3.504204e-03 2.374528e-02
hsa04142 Lysosome                                    6.278064e-02 4.882085e-06
hsa03060 Protein export                              1.365472e-02 5.193202e-02
hsa04145 Phagosome                                   2.369194e-01 4.056498e-05
                                                           DCIS_4       DCIS_5
hsa04141 Protein processing in endoplasmic reticulum 0.0006625566 0.0008103988
hsa00190 Oxidative phosphorylation                   0.0060485163 0.0019291992
hsa03050 Proteasome                                  0.0109516651 0.0005634787
hsa04142 Lysosome                                    0.0776453319 0.0007493152
hsa03060 Protein export                              0.0462144533 0.0095604241
hsa04145 Phagosome                                   0.0666306128 0.0033630990
                                                           DCIS_6
hsa04141 Protein processing in endoplasmic reticulum 0.0005137881
hsa00190 Oxidative phosphorylation                   0.0004844961
hsa03050 Proteasome                                  0.0256647373
hsa04142 Lysosome                                    0.0163971080
hsa03060 Protein export                              0.1408766864
hsa04145 Phagosome                                   0.0615631576
                                               p.geomean stat.mean        p.val
hsa03010 Ribosome                           0.0007984718 -2.832681 2.849399e-11
hsa04510 Focal adhesion                     0.0056279715 -2.086653 2.071403e-07
hsa04270 Vascular smooth muscle contraction 0.0276403091 -1.817748 5.044327e-06
hsa04020 Calcium signaling pathway          0.0693830829 -1.463336 1.755081e-04
hsa04540 Gap junction                       0.0899907123 -1.267066 1.015100e-03
hsa04360 Axon guidance                      0.1022064867 -1.208843 1.590880e-03
                                                   q.val set.size       DCIS_1
hsa03010 Ribosome                           4.559039e-09       38 1.305399e-05
hsa04510 Focal adhesion                     1.657123e-05      190 8.024170e-01
hsa04270 Vascular smooth muscle contraction 2.690308e-04      102 4.271257e-01
hsa04020 Calcium signaling pathway          7.020325e-03      159 3.446381e-02
hsa04540 Gap junction                       3.248319e-02       84 2.932968e-01
hsa04360 Axon guidance                      4.242347e-02      110 2.618370e-01
                                                  DCIS_2       DCIS_3
hsa03010 Ribosome                           0.9059473487 5.777162e-03
hsa04510 Focal adhesion                     0.0008410044 1.772765e-05
hsa04270 Vascular smooth muscle contraction 0.0036540856 4.410797e-02
hsa04020 Calcium signaling pathway          0.0340037498 1.119666e-01
hsa04540 Gap junction                       0.0130329409 1.213758e-01
hsa04360 Axon guidance                      0.0414326309 8.646858e-02
                                                  DCIS_4      DCIS_5
hsa03010 Ribosome                           8.680551e-05 0.004680625
hsa04510 Focal adhesion                     1.731370e-01 0.059392858
hsa04270 Vascular smooth muscle contraction 2.284282e-02 0.020244192
hsa04020 Calcium signaling pathway          1.527591e-01 0.053572889
hsa04540 Gap junction                       9.030708e-02 0.046207836
hsa04360 Axon guidance                      2.369809e-01 0.024235880
                                                  DCIS_6
hsa03010 Ribosome                           9.335673e-06
hsa04510 Focal adhesion                     2.583079e-04
hsa04270 Vascular smooth muscle contraction 1.400734e-02
hsa04020 Calcium signaling pathway          1.038941e-01
hsa04540 Gap junction                       2.743258e-01
hsa04360 Axon guidance                      2.115761e-01
                                                     stat.mean   DCIS_1
hsa04141 Protein processing in endoplasmic reticulum  3.517131 4.184493
hsa00190 Oxidative phosphorylation                    2.848815 4.066860
hsa03050 Proteasome                                   2.631099 3.239365
hsa04142 Lysosome                                     2.541611 2.382849
hsa03060 Protein export                               2.118436 3.371258
hsa04145 Phagosome                                    1.996678 1.467306
                                                        DCIS_2    DCIS_3
hsa04141 Protein processing in endoplasmic reticulum 4.5267758 2.6444345
hsa00190 Oxidative phosphorylation                   3.5861171 0.6245743
hsa03050 Proteasome                                  2.7925697 2.0223035
hsa04142 Lysosome                                    1.5379431 4.5333081
hsa03060 Protein export                              2.3386121 1.6746526
hsa04145 Phagosome                                   0.7172982 4.0101750
                                                       DCIS_4   DCIS_5   DCIS_6
hsa04141 Protein processing in endoplasmic reticulum 3.242629 3.183509 3.320944
hsa00190 Oxidative phosphorylation                   2.533425 2.925367 3.356548
hsa03050 Proteasome                                  2.344753 3.397929 1.989674
hsa04142 Lysosome                                    1.426453 3.219610 2.149500
hsa03060 Protein export                              1.732545 2.492772 1.100775
hsa04145 Phagosome                                   1.506097 2.732349 1.546842
                                              p.geomean stat.mean        p.val
hsa04510 Focal adhesion                     0.001759747  2.916255 6.710926e-13
hsa04512 ECM-receptor interaction           0.001789555  2.949737 6.859265e-13
hsa04974 Protein digestion and absorption   0.030772743  1.806869 6.276812e-06
hsa04514 Cell adhesion molecules (CAMs)     0.042715144  1.618376 4.119645e-05
hsa04145 Phagosome                          0.044485242  1.524643 1.031301e-04
hsa04270 Vascular smooth muscle contraction 0.063978404  1.441140 2.263397e-04
                                                   q.val set.size      DCIS_1
hsa04510 Focal adhesion                     5.487412e-11      190 0.006457751
hsa04512 ECM-receptor interaction           5.487412e-11       77 0.002658355
hsa04974 Protein digestion and absorption   3.347633e-04       69 0.023791026
hsa04514 Cell adhesion molecules (CAMs)     1.647858e-03      114 0.205535591
hsa04145 Phagosome                          3.300164e-03      132 0.059953606
hsa04270 Vascular smooth muscle contraction 6.035725e-03      102 0.060206160
                                                  DCIS_2       DCIS_3
hsa04510 Focal adhesion                     0.0003838236 0.0015689159
hsa04512 ECM-receptor interaction           0.0006839255 0.0005322015
hsa04974 Protein digestion and absorption   0.0422893683 0.0083314407
hsa04514 Cell adhesion molecules (CAMs)     0.0795633984 0.0010784270
hsa04145 Phagosome                          0.3198672383 0.0007942023
hsa04270 Vascular smooth muscle contraction 0.0938835103 0.0292009834
                                                 DCIS_4       DCIS_5
hsa04510 Focal adhesion                     0.009876287 0.0003157179
hsa04512 ECM-receptor interaction           0.010654034 0.0006138785
hsa04974 Protein digestion and absorption   0.342841545 0.0377945488
hsa04514 Cell adhesion molecules (CAMs)     0.142507194 0.0287101814
hsa04145 Phagosome                          0.074230631 0.0202186887
hsa04270 Vascular smooth muscle contraction 0.476455683 0.0220696693
                                                 DCIS_6
hsa04510 Focal adhesion                     0.002449038
hsa04512 ECM-receptor interaction           0.005190105
hsa04974 Protein digestion and absorption   0.007818261
hsa04514 Cell adhesion molecules (CAMs)     0.084183585
hsa04145 Phagosome                          0.339034075
hsa04270 Vascular smooth muscle contraction 0.039514306
                                            stat.mean    DCIS_1    DCIS_2
hsa04510 Focal adhesion                      2.916255 2.4989854 3.3967710
hsa04512 ECM-receptor interaction            2.949737 2.8390006 3.2772263
hsa04974 Protein digestion and absorption    1.806869 2.0038044 1.7399543
hsa04514 Cell adhesion molecules (CAMs)      1.618376 0.8235421 1.4128082
hsa04145 Phagosome                           1.524643 1.5603843 0.4686453
hsa04270 Vascular smooth muscle contraction  1.441140 1.5599951 1.3221288
                                              DCIS_3     DCIS_4   DCIS_5
hsa04510 Focal adhesion                     2.975650 2.34129792 3.452591
hsa04512 ECM-receptor interaction           3.349328 2.32875195 3.304206
hsa04974 Protein digestion and absorption   2.436997 0.40558753 1.795465
hsa04514 Cell adhesion molecules (CAMs)     3.108789 1.07176073 1.911206
hsa04145 Phagosome                          3.194384 1.44926906 2.059597
hsa04270 Vascular smooth muscle contraction 1.905183 0.05912534 2.031527
                                               DCIS_6
hsa04510 Focal adhesion                     2.8322324
hsa04512 ECM-receptor interaction           2.5999099
hsa04974 Protein digestion and absorption   2.4594032
hsa04514 Cell adhesion molecules (CAMs)     1.3821473
hsa04145 Phagosome                          0.4155804
hsa04270 Vascular smooth muscle contraction 1.7688787

gage documentation built on Dec. 13, 2020, 2:01 a.m.