esset.grp: The non-redundant signcant gene set list

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/esset.grp.R

Description

This function extract a non-redundant signcant gene set list, groups of redundant gene sets, and related data from gage results. Redundant gene sets are those overlap heavily in their effective member gene lists or core genes.

Usage

1
2
3
4
5
6
esset.grp(setp, exprs, gsets, ref = NULL, samp = NULL, test4up = TRUE,
same.dir = TRUE, compare = "paired", use.fold = TRUE, cutoff = 0.01,
use.q = FALSE, pc = 10^-10, output = TRUE, outname = "esset.grp",
make.plot = FALSE, pdf.size = c(7, 7), core.counts = FALSE, get.essets =
TRUE, bins = 10, bsize = 1, cex = 0.5, layoutType = "circo", name.str =
c(10, 100), ...)

Arguments

setp

a numeric matrix, the result p-value matrix returned by gage function. Check gage help information for details.

exprs

an expression matrix or matrix-like data structure, with genes as rows and samples as columns.

gsets

a named list, each element contains a gene set that is a character vector of gene IDs or symbols. For example, type head(kegg.gs). A gene set can also be a "smc" object defined in PGSEA package. Make sure that the same gene ID system is used for both gsets and exprs.

ref

a numeric vector of column numbers for the reference condition or phenotype (i.e. the control group) in the exprs data matrix. Default ref = NULL, all columns are considered as target experiments.

samp

a numeric vector of column numbers for the target condition or phenotype (i.e. the experiment group) in the exprs data matrix. Default samp = NULL, all columns other than ref are considered as target experiments.

test4up

boolean, whether the input gage result or signficant gene sets are test results for up-regulated gene sets or not. This information is needed for selecting core member genes which contribute to the overall signficance of a gene sets.

same.dir

boolean, whether the input gage result test for changes in a gene set toward a single direction (all genes up or down regulated) or changes towards both directions simultaneously.

compare

character, which comparison scheme to be used: 'paired', 'unpaired', '1ongroup', 'as.group'. 'paired' is the default, ref and samp are of equal length and one-on-one paired by the original experimental design; 'as.group', group-on-group comparison between ref and samp; 'unpaired' (used to be '1on1'), one-on-one comparison between all possible ref and samp combinations, although the original experimental design may not be one-on-one paired; '1ongroup', comparison between one samp column at a time vs the average of all ref columns.

use.fold

Boolean, whether the input gage results used fold changes or t-test statistics as per gene statistics. Default use.fold= TRUE.

cutoff

numeric, p- or q-value cutoff, between 0 and 1. Default 0.01 (for p-value). When q-value is used, recommended cutoff value is 0.1.

use.q

boolean, whether to use q-value or not as the pre-selection of a signficant gene set list. Default to be FALSE, i.e. use the p-value instead.

pc

numeric, cutoff p-value for the overlap between gene sets to be called 'redundant', default to 10e-10, may need trial-and-error to find the best value.

output

boolean, whether output the non-redundant gene set list as tab-delimited text file? Default to be TRUE.

outname

character, the prefix used to label the output file names when output = TRUE.

make.plot

boolean, whether to generate the network graph to visualize the redundancy (overlap in core genes) between significant gene sets. Currently the only feasible option is FALSE.

pdf.size

numeric vector of length 2, spcifies the PDF file size for network graph outpout. Currently unsupported.

core.counts

Currently unsupported.

get.essets

Currently unsupported.

bins

Currently unsupported.

bsize

Currently unsupported.

cex

Currently unsupported.

layoutType

Currently unsupported.

name.str

numeric vector of length 2, specifies the substring range of the gene set name to show in the network graph. Currently unsupported.

...

extra arguments to be passed into internal function make.graph. Currently unsupported.

Details

Redundant gene sets are defined to be those overlap heavily in their effective member gene lists or core genes. Core genes are those member genes that really contribute to the signficance of the gene set in GAGE analysis in the interesting direction(s). Argument pc set the cutoff for the overlap to be called "redundant". The redundancy between gene sets is then represented by a undirected graph/network. Groups of redundant gene sets are then derived as the connected component in the network graph.

The selection criterion for gene sets here is p-value, instead of the commonly used q-value. This is because for extracting a non-redundant list of signficant gene sets, p-value is relative stable, but q-value changes when the total number of gene sets being considered changes. Of course, q-value is also a sensible selection criterion, when one take this step as a further refinement on the list of signficant gene sets.

Value

The value returned by pairData is a list of 7 elements:

essentialSets

character vector, the non-redundant signficant gene set list.

setGroups

list, each element is a character vector of a group of redundant gene sets.

allSets

character vector, the full list of signficant gene sets.

setGroups

list, each element is a character vector of a connected component in the redundancy graph representation of the gene set.

overlapCounts

numeric matrix, the overlap core gene counts between the signficant gene sets.

overlapPvals

numeric matrix, the significance (in p-values) of the overlap core gene counts between the signficant gene sets.

coreGeneSets

list, each element is a character vector of the core genes in a significant gene set.

Author(s)

Weijun Luo <luo_weijun@yahoo.com>

References

Luo, W., Friedman, M., Shedden K., Hankenson, K. and Woolf, P GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161

See Also

gage the main function for GAGE analysis; sigGeneSet significant gene set from GAGE analysis; essGene essential member genes in a gene set;

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
data(gse16873)
cn=colnames(gse16873)
hn=grep('HN',cn, ignore.case =TRUE)
dcis=grep('DCIS',cn, ignore.case =TRUE)
data(kegg.gs)

#kegg test for 1-directional changes
gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs, 
    ref = hn, samp = dcis)
#kegg test for 2-directional changes
gse16873.kegg.2d.p <- gage(gse16873, gsets = kegg.gs,
    ref = hn, samp = dcis, same.dir = FALSE)
gse16873.kegg.esg.up <- esset.grp(gse16873.kegg.p$greater,
    gse16873, gsets = kegg.gs, ref = hn, samp = dcis,
    test4up = TRUE, output = TRUE, outname = "gse16873.kegg.up", make.plot = FALSE)
gse16873.kegg.esg.dn <- esset.grp(gse16873.kegg.p$less,
    gse16873, gsets = kegg.gs, ref = hn, samp = dcis,
    test4up = FALSE, output = TRUE, outname = "gse16873.kegg.dn", make.plot = FALSE)
gse16873.kegg.esg.2d <- esset.grp(gse16873.kegg.2d.p$greater,
    gse16873, gsets = kegg.gs, ref = hn, samp = dcis,
    test4up = TRUE, output = TRUE, outname = "gse16873.kegg.2d", make.plot = FALSE)
names(gse16873.kegg.esg.up)
head(gse16873.kegg.esg.up$essentialSets, 4)
head(gse16873.kegg.esg.up$setGroups, 4)
head(gse16873.kegg.esg.up$coreGeneSets, 4)

Example output

[1] "essentialSets"      "setGroups"          "allSets"           
[4] "connectedComponent" "overlapCounts"      "overlapPvals"      
[7] "coreGeneSets"      
[1] "hsa04141 Protein processing in endoplasmic reticulum"
[2] "hsa00190 Oxidative phosphorylation"                  
[3] "hsa03050 Proteasome"                                 
[4] "hsa04142 Lysosome"                                   
[[1]]
[1] "hsa04141 Protein processing in endoplasmic reticulum"

[[2]]
[1] "hsa00190 Oxidative phosphorylation"

[[3]]
[1] "hsa03050 Proteasome"

[[4]]
[1] "hsa04142 Lysosome"                 "hsa00511 Other glycan degradation"

$`hsa04141 Protein processing in endoplasmic reticulum`
 [1] "51128" "2923"  "10130" "3312"  "10970" "3301"  "9451"  "23480" "9601" 
[10] "811"   "3309"  "7323"  "7494"  "6748"  "64374" "10525" "7466"  "10808"
[19] "6184"  "821"   "56886" "5887"  "6185"  "5034"  "6745"  "1603"  "7095" 
[28] "79139" "3320"  "27102" "10294" "10483" "7415"  "7324"  "30001" "64215"
[37] "7184"  "5611"  "29927" "9871"  "9978"  "5601"  "22872" "6500"  "23471"
[46] "22926" "11231" "57134" "6396"  "10134"

$`hsa00190 Oxidative phosphorylation`
 [1] "1345"  "4720"  "4710"  "51382" "4709"  "51606" "27089" "533"   "29796"
[10] "9377"  "4711"  "528"   "514"   "10975" "51079" "10312" "8992"  "518"  
[19] "4696"  "1537"  "1340"  "4708"  "521"   "54539" "537"   "4694"  "4701" 
[28] "4702"  "509"  

$`hsa03050 Proteasome`
 [1] "5691"  "10213" "5684"  "5685"  "5701"  "5688"  "51371" "5692"  "5714" 
[10] "9861"  "5705"  "5720"  "5687"  "7979"  "5708"  "5690"  "5696"  "5718" 
[19] "5707" 

$`hsa04142 Lysosome`
 [1] "3920"  "1509"  "2517"  "1213"  "2720"  "1508"  "51606" "1512"  "54"   
[10] "55353" "2990"  "1520"  "427"   "5660"  "533"   "10577" "27074" "5476" 
[19] "9741"  "967"   "1514"  "10312" "2799"  "1075"  "3074"  "4126"  "8763" 
[28] "10239" "7805"  "537"   "3988" 

gage documentation built on Dec. 13, 2020, 2:01 a.m.