sigGeneSet: Significant gene set from GAGE analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/sigGeneSet.R

Description

This function sorts and counts signcant gene sets based on q- or p-value cutoff.

Usage

1
2
3
sigGeneSet(setp, cutoff = 0.1, dualSig = (0:2)[2], qpval = c("q.val",
"p.val")[1],heatmap=TRUE, outname="array", pdf.size = c(7,7),
p.limit=c(0.5, 5.5), stat.limit=5,  ...)

Arguments

setp

the result object returned by gage function, either a numeric matrix or a list of two such matrices. Check gage help information for details.

cutoff

numeric, q- or p-value cutoff, between 0 and 1. Default 0.1 (for q-value). When p-value is used, recommended cutoff value is 0.001 for data with more than 2 replicates per condition or 0.01 for les sample sizes.

dualSig

integer, switch argument controlling how dual-signficant gene sets should be treated. This argument is only useful when Stouffer method is not used in gage function (use.stouffer=FALSE), hence makes no difference normally. 0: discard such gene sets from the final significant gene set list; 1: keep such gene sets in the more significant direction and remove them from the less significant direction; 2: keep such gene sets in the lists for both directions. default to 1. Dual-signficant means a gene set is called significant simultaneously in both 1-direction tests (up- and down-regulated). Check the details for more information.

qpval

character, specifies the column name used for gene set selection, i.e. what type of q- or p-value to use in gene set selection. Default to be "q.val" (q-value using BH procedure). "p.val" is the unadjusted global p-value and may be used as selection criterion sometimes.

heatmap

boolean, whether to plot heatmap for the selected gene data as a PDF file. Default to be FALSE.

outname

a character string, to be used as the prefix of the output data files. Default to be "array".

pdf.size

a numeric vector to specify the the width and height of PDF graphics region in inches. Default to be c(7, 7).

stat.limit

numeric vector of length 1 or 2 to specify the value range of gene set statistics to visualize using the heatmap. Statistics beyong will be reset to equal the proximal limit. Default to 5, i.e. plot all gene set statistics within (-5, 5) range. May also be NULL, i.e. plot all statistics without limit. This argument allows optimal differentiation between most gene set statistic values when extremely positive/negative values exsit and squeeze the normal-value region.

p.limit

numeric vector of length 1 or 2 to specify the value range of gene set -log10(p-values) to visualize using the heatmap. Values beyong will be reset to equal the proximal limit. Default to c(0.5,5.5), i.e. plot all -log10(p-values) within this range. This argument is similar to argument stat.limit.

...

other arguments to be passed into the inside gs.heatmap function, which is a wrapper of the heatmap2 function.

Details

By default, heatmaps are produced to show the gene set perturbations using either -log10(p-value) or statistics.

Since gage package version 2.2.0, Stouffer's method is used as the default procedure for more robust p-value summarization. With the original p-value summarization, i.e. negative log sum following a Gamma distribution as the Null hypothesis, the global p-value could be heavily affected by a small subset of extremely small individual p-values from pair-wise comparisons. Such sensitive global p-value leads to the "dual signficance" phenomenon. In other words, Gene sets are signficantly up-regulated in a subset of experiments, but down-regulated in another subset. Note that dual-signficant gene sets are not the same as gene sets called signficant in 2-directional tests, although they are related.

Value

sigGeneSet function returns a named list of the same structure as gage result. Check gage help information for details.

Author(s)

Weijun Luo <luo_weijun@yahoo.com>

References

Luo, W., Friedman, M., Shedden K., Hankenson, K. and Woolf, P GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161

See Also

gage the main function for GAGE analysis; esset.grp non-redundant signcant gene set list; essGene essential member genes in a gene set;

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
data(gse16873)
cn=colnames(gse16873)
hn=grep('HN',cn, ignore.case =TRUE)
dcis=grep('DCIS',cn, ignore.case =TRUE)
data(kegg.gs)

#kegg test for 1-directional changes
gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs, 
    ref = hn, samp = dcis)
#kegg test for 2-directional changes
gse16873.kegg.2d.p <- gage(gse16873, gsets = kegg.gs,
    ref = hn, samp = dcis, same.dir = FALSE)
gse16873.kegg.sig<-sigGeneSet(gse16873.kegg.p, outname="gse16873.kegg")
str(gse16873.kegg.sig)
gse16873.kegg.2d.sig<-sigGeneSet(gse16873.kegg.2d.p, outname="gse16873.kegg")
str(gse16873.kegg.2d.sig)
#also check the heatmaps in pdf files named "*.heatmap.pdf".

Example output

[1] "there are 22 signficantly up-regulated gene sets"
[1] "there are 14 signficantly down-regulated gene sets"
List of 3
 $ greater: num [1:22, 1:11] 0.000216 0.001497 0.004771 0.003718 0.01862 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:22] "hsa04141 Protein processing in endoplasmic reticulum" "hsa00190 Oxidative phosphorylation" "hsa03050 Proteasome" "hsa04142 Lysosome" ...
  .. ..$ : chr [1:11] "p.geomean" "stat.mean" "p.val" "q.val" ...
 $ less   : num [1:14, 1:11] 0.000798 0.005628 0.02764 0.069383 0.089991 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:14] "hsa03010 Ribosome" "hsa04510 Focal adhesion" "hsa04270 Vascular smooth muscle contraction" "hsa04020 Calcium signaling pathway" ...
  .. ..$ : chr [1:11] "p.geomean" "stat.mean" "p.val" "q.val" ...
 $ stats  : num [1:36, 1:7] 3.52 2.85 2.63 2.54 2.12 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:36] "hsa04141 Protein processing in endoplasmic reticulum" "hsa00190 Oxidative phosphorylation" "hsa03050 Proteasome" "hsa04142 Lysosome" ...
  .. ..$ : chr [1:7] "stat.mean" "DCIS_1" "DCIS_2" "DCIS_3" ...
[1] "there are 27 signficantly two-direction perturbed gene sets"
List of 2
 $ greater: num [1:27, 1:11] 0.00176 0.00179 0.03077 0.04272 0.04449 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:27] "hsa04510 Focal adhesion" "hsa04512 ECM-receptor interaction" "hsa04974 Protein digestion and absorption" "hsa04514 Cell adhesion molecules (CAMs)" ...
  .. ..$ : chr [1:11] "p.geomean" "stat.mean" "p.val" "q.val" ...
 $ stats  : num [1:27, 1:7] 2.92 2.95 1.81 1.62 1.52 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:27] "hsa04510 Focal adhesion" "hsa04512 ECM-receptor interaction" "hsa04974 Protein digestion and absorption" "hsa04514 Cell adhesion molecules (CAMs)" ...
  .. ..$ : chr [1:7] "stat.mean" "DCIS_1" "DCIS_2" "DCIS_3" ...

gage documentation built on Dec. 13, 2020, 2:01 a.m.