padog: Pathway Analysis with Down-weighting of Overlapping Genes...
In PADOG: Pathway Analysis with Down-weighting of Overlapping Genes (PADOG)

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/padog.R

This is a general purpose gene set analysis method that downplays the importance of genes that apear often accross the sets of genes analyzed. The package provides also a benchmark for gene set analysis in terms of sensitivity and ranking using 24 public datasets.

1
2
3

padog(esetm=NULL,group=NULL,paired=FALSE,block=NULL,gslist="KEGGRESTpathway",organism="hsa",
      annotation=NULL,gs.names=NULL,NI=1000,plots=FALSE,targetgs=NULL,Nmin=3,
      verbose=TRUE,parallel=FALSE,dseed=NULL,ncr=NULL)

`esetm`	A matrix containing log transfomed and normalized gene expression data. Rows correspond to genes and columns to samples.
`group`	A character vector with the class labels of the samples. It can only contain "c" for control samples or "d" for disease samples.
`paired`	A logical value to indicate if the samples in the two groups are paired.
`block`	A character vector indicating the block ids of the samples classified by the group variable, if `paired=TRUE`. The paired samples must have the same block value.
`gslist`	Either the value "KEGGRESTpathway" or a list with the gene sets. If set to "KEGGRESTpathway", then gene sets will be made of all KEGG pathways for the `organism` specified. If a list is provided, instead, each element of the list should be a character vector with the identifiers for the genes. The identifiers can be probe(sets) ids if the `annotation` argument is set to a valid annotation package, otherwise the gene identifiers must be of the same kind as the rownames of the matrix esetm.
`annotation`	A valid chip annotation package if the rownames of `esetm` are probe(set) ids and `gslist` contains ENTREZ identifiers or `gslist` is set to "KEGGRESTpathway". If the rownames are other gene identifies, then `annotation` has tyo be set to NULL, and the row names of `esetm` needs to be unique and be found among elements of `gslist`
`organism`	A three letter string giving the name of the organism supported by the "KEGGREST" package.
`gs.names`	Character vector with the names of the gene sets. If specified, must have the same length as gslist.
`NI`	Number of iterations to determine the gene set score significance p-values.
`plots`	If set to TRUE then the distribution of the PADOG scores with and without weighting the genes in raw and standardized form are shown using boxplots. A pdf file will be created in the current directory having the name provided in the `targetgs` field. The scores for the `targetgs` gene set will be shown in red.
`targetgs`	The identifier of a traget gene set for which the scores will be highlighted in the plots produced if `plots=TRUE`
`Nmin`	The minimum size of gene sets to be included in the analysis.
`verbose`	If set to TRUE, displays the number of iterations elapsed is displayed.
`parallel`	If set to TRUE, the `NI` iterations will be executed in parallel if multiple CPU cores are available and foreach and doRNG packages are installed.
`dseed`	Optional initial seed for random number generator (integer).
`ncr`	The number of CPU cores used when `parallel` set to TRUE. Default is to use all CPU cores detected.

See cited documents for more details.

A data frame containing the ranked pathways and various statistics: Name is the name of the gene set; ID is the gene set identifier; Size is the number of genes in the geneset; meanAbsT0 is the mean of absolute t-scores; padog0 is the mean of weighted absolute t-scores; PmeanAbsT significance of the meanAbsT0; Ppadog is the significance of the padog0 score;

Adi Laurentiu Tarca <atarca@med.wayne.edu>

Adi L. Tarca, Sorin Draghici, Gaurav Bhatti, Roberto Romero, Down-weighting overlapping genes improves gene set analysis, BMC Bioinformatics, 2012, submitted.

padog

#run padog on a colorectal cancer dataset of the 24 datasets benchmark GSE9348
#use NI=1000 for accurate results.
set="GSE9348"
data(list=set,package="KEGGdzPathwaysGEO")
x=get(set)
#Extract from the dataset the required info
exp=experimentData(x);
dataset= exp@name
dat.m=exprs(x)
ano=pData(x)
design= notes(exp)$design
annotation= paste(x@annotation,".db",sep="")
targetGeneSets= notes(exp)$targetGeneSets


myr=padog(
esetm=dat.m,
group=ano$Group,
paired=design=="Paired",
block=ano$Block,
targetgs=targetGeneSets,
annotation=annotation,
gslist="KEGGRESTpathway",
organism="hsa",
verbose=TRUE,
Nmin=3,
NI=25,
plots=FALSE,
dseed=1)


myr2=padog(
esetm=dat.m,
group=ano$Group,
paired=design=="Paired",
block=ano$Block,
targetgs=targetGeneSets,
annotation=annotation,
gslist="KEGGRESTpathway",
organism="hsa",
verbose=TRUE,
Nmin=3,
NI=25,
plots=FALSE,
dseed=1,
paral=TRUE,
ncr=2)


myr[1:20,]

all.equal(myr, myr2)