Pathway Analysis with Down-weighting of Overlapping Genes (PADOG)

Share:

Description

This is a general purpose gene set analysis method that downplays the importance of genes that apear often accross the sets of genes analyzed. The package provides also a benchmark for gene set analysis in terms of sensitivity and ranking using 24 public datasets.

Usage

1
2
3
padog(esetm=NULL,group=NULL,paired=FALSE,block=NULL,gslist="KEGG.db",organism="hsa",
      annotation=NULL,gs.names=NULL,NI=1000,plots=FALSE,targetgs=NULL,Nmin=3,
      verbose=TRUE,parallel=FALSE,dseed=NULL,ncr=NULL)

Arguments

esetm

A matrix containing log transfomed and normalized gene expression data. Rows correspond to genes and columns to samples.

group

A character vector with the class labels of the samples. It can only contain "c" for control samples or "d" for disease samples.

paired

A logical value to indicate if the samples in the two groups are paired.

block

A character vector indicating the block ids of the samples classified by the group variable, if paired=TRUE. The paired samples must have the same block value.

gslist

Either the value "KEGG.db" or a list with the gene sets. If set to "KEGG.db", then gene sets will be made of all KEGG pathways for the organism specified. If a list is provided, instead, each element of the list should be a character vector with the identifiers for the genes. The identifiers can be probe(sets) ids if the annotation argument is set to a valid annotation package, otherwise the gene identifiers must be of the same kind as the rownames of the matrix esetm.

annotation

A valid chip annotation package if the rownames of esetm are probe(set) ids and gslist contains ENTREZ identifiers or gslist is set to "KEGG.db". If the rownames are other gene identifies, then annotation has tyo be set to NULL, and the row names of esetm needs to be unique and be found among elements of gslist

organism

A three letter string giving the name of the organism supported by the "KEGG.db" package.

gs.names

Character vector with the names of the gene sets. If specified, must have the same length as gslist.

NI

Number of iterations to determine the gene set score significance p-values.

plots

If set to TRUE then the distribution of the PADOG scores with and without weighting the genes in raw and standardized form are shown using boxplots. A pdf file will be created in the current directory having the name provided in the targetgs field. The scores for the targetgs gene set will be shown in red.

targetgs

The identifier of a traget gene set for which the scores will be highlighted in the plots produced if plots=TRUE

Nmin

The minimum size of gene sets to be included in the analysis.

verbose

If set to TRUE, displays the number of iterations elapsed is displayed.

parallel

If set to TRUE, the NI iterations will be executed in parallel if multiple CPU cores are available and foreach and doRNG packages are installed.

dseed

Optional initial seed for random number generator (integer).

ncr

The number of CPU cores used when parallel set to TRUE. Default is to use all CPU cores detected.

Details

See cited documents for more details.

Value

A data frame containing the ranked pathways and various statistics: Name is the name of the gene set; ID is the gene set identifier; Size is the number of genes in the geneset; meanAbsT0 is the mean of absolute t-scores; padog0 is the mean of weighted absolute t-scores; PmeanAbsT significance of the meanAbsT0; Ppadog is the significance of the padog0 score;

Author(s)

Adi Laurentiu Tarca <atarca@med.wayne.edu>

References

Adi L. Tarca, Sorin Draghici, Gaurav Bhatti, Roberto Romero, Down-weighting overlapping genes improves gene set analysis, BMC Bioinformatics, 2012, submitted.

See Also

padog

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#run padog on a colorectal cancer dataset of the 24 datasets benchmark GSE9348
#use NI=1000 for accurate results.
set="GSE9348"
data(list=set,package="KEGGdzPathwaysGEO")
x=get(set)
#Extract from the dataset the required info
exp=experimentData(x);
dataset= exp@name
dat.m=exprs(x)
ano=pData(x)
design= notes(exp)$design
annotation= paste(x@annotation,".db",sep="")
targetGeneSets= notes(exp)$targetGeneSets


myr=padog(
esetm=dat.m,
group=ano$Group,
paired=design=="Paired",
block=ano$Block,
targetgs=targetGeneSets,
annotation=annotation,
gslist="KEGG.db",
organism="hsa",
verbose=TRUE,
Nmin=3,
NI=25,
plots=FALSE,
dseed=1)


myr2=padog(
esetm=dat.m,
group=ano$Group,
paired=design=="Paired",
block=ano$Block,
targetgs=targetGeneSets,
annotation=annotation,
gslist="KEGG.db",
organism="hsa",
verbose=TRUE,
Nmin=3,
NI=25,
plots=FALSE,
dseed=1,
paral=TRUE,
ncr=2)


myr[1:20,]

all.equal(myr, myr2)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.