bilevelAnalysisGeneset: Bi-level meta-analysis - applied to geneset enrichment...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Perform a bi-level meta-analysis in conjunction with geneset enrichment methods (ORA/GSA/PADOG) to integrate multiple gene expression datasets.

Usage

1
2
3
bilevelAnalysisGeneset(gslist, gs.names, dataList, groupList, splitSize = 5,
  metaMethod = addCLT, enrichment = "ORA", pCutoff = 0.05,
  percent = 0.05, mc.cores = 1, ...)

Arguments

gslist

a list of gene sets.

gs.names

names of the gene sets.

dataList

a list of datasets to be combined. Each dataset is a data frame where the rows are the gene IDs and the columns are the samples.

groupList

a list of vectors. Each vector represents the phenotypes of the corresponding dataset in dataList. The elements of each vector are either 'c' (control) or 'd' (disease).

splitSize

the minimum number of disease samples in each split dataset. splitSize should be at least 3. By default, splitSize=5

metaMethod

the method used to combine p-values. This should be one of addCLT (additive method [1]), fisherMethod (Fisher's method [5]), stoufferMethod (Stouffer's method [6]), max (maxP method [7]), or min (minP method [8])

enrichment

the method used for enrichment analysis. This should be one of "ORA", "GSA", or "PADOG". By default, enrichment is set to "ORA".

pCutoff

cutoff p-value used to identify differentially expressed (DE) genes. This parameter is used only when the enrichment method is "ORA". By default, pCutoff=0.05 (five percent)

percent

percentage of genes with highest foldchange to be considered as differentially expressed (DE). This parameter is used when the enrichment method is "ORA". By default percent=0.05 (five percent). Please note that only genes with p-value less than pCutoff will be considered

mc.cores

the number of cores to be used in parallel computing. By default, mc.cores=1

...

additional parameters of the GSA/PADOG functions

Details

The bi-level framework combines the datasets at two levels: an intra- experiment analysis, and an inter-experiment analysis [1]. At the intra-level analysis, the framework splits a dataset into smaller datasets, performs enrichment analysis for each split dataset (using ORA [2], GSA [3], or PADOG [4]), and then combines the results of these split datasets using metaMethod. At the inter-level analysis, the results obtained for individual datasets are combined using metaMethod

Value

A data frame (rownames are geneset/pathway IDs) that consists of the following information:

Author(s)

Tin Nguyen and Sorin Draghici

References

[1] T. Nguyen, R. Tagett, M. Donato, C. Mitrea, and S. Draghici. A novel bi-level meta-analysis approach – applied to biological pathway analysis. Bioinformatics, 32(3):409-416, 2016.

[2] S. Draghici, P. Khatri, R. P. Martin, G. C. Ostermeier, and S. A. Krawetz. Global functional profiling of gene expression. Genomics, 81(2):98-104, 2003.

[3] B. Efron and R. Tibshirani. On testing the significance of sets of genes. The Annals of Applied Statistics, 1(1):107-129, 2007.

[4] A. L. Tarca, S. Draghici, G. Bhatti, and R. Romero. Down-weighting overlapping genes improves gene set analysis. BMC Bioinformatics, 13(1):136, 2012.

[5] R. A. Fisher. Statistical methods for research workers. Oliver & Boyd, Edinburgh, 1925.

[6] S. Stouffer, E. Suchman, L. DeVinney, S. Star, and J. Williams, RM. The American Soldier: Adjustment during army life, volume 1. Princeton University Press, Princeton, 1949.

[7] L. H. C. Tippett. The methods of statistics. The Methods of Statistics, 1931.

[8] B. Wilkinson. A statistical consideration in psychological research. Psychological Bulletin, 48(2):156, 1951.

See Also

bilevelAnalysisPathway, phyper, GSA, padog

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# load KEGG pathways and create gene sets
x <- loadKEGGPathways()
gslist <- lapply(x$kpg,FUN=function(y){return (nodes(y));})
gs.names <- x$kpn[names(gslist)]

# load example data
dataSets <- c("GSE17054", "GSE57194", "GSE33223", "GSE42140")
data(list=dataSets, package="BLMA")
names(dataSets) <- dataSets
dataList <- lapply(dataSets, function(dataset) get(paste0("data_", dataset)))
groupList <- lapply(dataSets, function(dataset) get(paste0("group_", dataset)))
# perform bi-level meta-analysis in conjunction with ORA
ORAComb <- bilevelAnalysisGeneset(gslist, gs.names, dataList, groupList, enrichment = "ORA")
head(ORAComb[, c("Name", "pBLMA", "pBLMA.fdr", "rBLMA")])

# perform bi-level meta-analysis in conjunction with GSA
GSAComb <- bilevelAnalysisGeneset(gslist, gs.names, dataList, groupList, enrichment = "GSA", nperms = 200, random.seed = 1)
head(GSAComb[, c("Name", "pBLMA", "pBLMA.fdr", "rBLMA")])

# perform bi-level meta-analysi in conjunction with PADOG
set.seed(1)
PADOGComb <- bilevelAnalysisGeneset(gslist, gs.names, dataList, groupList, enrichment = "PADOG", NI=200)
head(PADOGComb[, c("Name", "pBLMA", "pBLMA.fdr", "rBLMA")])

BLMA documentation built on Nov. 8, 2020, 8:15 p.m.