Gene Set Enrichment Analysis (GSEA)

Description

A binomial version of GSEA, unified as much as possible with nea.render. Given the altered gene sets (AGS) and functional gene sets (FGS), calculates no. of members (genes/protein IDs) shared by each AGS-FGS pair as well as respective enrichment statistics. Returns matrices of size length(FGS) x length(AGS) (see "Value"). Each of these two parameters can be submitted as either a text file or as an R list which have been preloaded with import.gs.

Usage

1
2
3
gsea.render(AGS, FGS, Lowercase = 1, ags.gene.col = 2, ags.group.col = 3,
  fgs.gene.col = 2, fgs.group.col = 3, echo = 1, Ntotal = 20000,
  Parallelize = 1)

Arguments

AGS

Either a text file or a list of members of each AGS (see Details). Group IDs should be found in ags.group.col and gene IDs would be found in ags.gene.col. Identical to AGS needed for in nea.render - see also details there.

FGS

Either a text file or a list of members of each FGS (see Details). Group IDs should be found in fgs.group.col and gene IDs would be found in fgs.gene.col. Alsmost identical to FGS needed for in nea.render.

Lowercase

render node and group IDs lower-case (Default:1, i.e. 'yes').

ags.gene.col

number of the column containing AGS genes (only needed if AGS is submitted as a text file).

ags.group.col

number of the column containing group IDs (only needed if AGS is submitted as a text file).

fgs.gene.col

number of the column containing FGS genes (only needed if FGS is submitted as a text file).

fgs.group.col

number of the column containing group IDs (only needed if FGS is submitted as a text file).

echo

if messages about execution progress should appear.

Ntotal

The important parameter for precise calculation of the Fisher's statistics: how big is the whole gene universe? Defaults to 20000 but should be changed depending on the hypothesis and genome/proteome size.

Parallelize

The number of CPU cores to be used for calculating the gene set overlap. The other steps are sufficiently fast. The option is not supported in Windows.

Value

A list of entries estimate, p, q, and n, each of which is a matrix of size length(FGS) x length(AGS). The two former ones contain respective output of fisher.test: p.value and estimate, whereas q is produced by p.adjust(p.value, method="BH") and n is the no. of shared members. Input to fisher.test is matrix(c(<no. of shared members>, <no. of solely FGS members>, <no. of solely AGS members>, <no. of non-members>), nrow=2).

See Also

nea.render, import.gs

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
ags.list <- samples2ags(fantom5.43samples, Ntop=1000, method="topnorm")
data(can.sig.go)
fpath <- can.sig.go
fgs.list <- import.gs(fpath)
g1 <- gsea.render(AGS=ags.list, FGS=fgs.list, Lowercase = 1, 
ags.gene.col = 2, ags.group.col = 3, fgs.gene.col = 2, fgs.group.col = 3, 
echo=1, Ntotal = 20000, Parallelize=1)
hist(log(g1$estimate), breaks=100)
hist(g1$n, breaks=100)
hist(g1$p, breaks=100)
hist(g1$q, breaks=100)