getStatistics: Intergrative genes statistic
In BLMA: BLMA: A package for bi-level meta-analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

Calculate genes summary statistic across multiple datasets

1	getStatistics(allGenes, dataList, groupList, ncores = 1, method = addCLT)

`allGenes`	Vector of all genes names for the analysis.
`dataList`	A list of expression matrices, in which rows are genes and columns are samples.
`groupList`	A list of vectors indicating sample group corresponding with expression matrices in dataList.
`ncores`	Number of core to use in prallel processing.
`method`	Function for combining p-values. It must accept one input which is a vector of p-values and return a combined p-value. Three methods are embeded in this package are addCLT, fisherMethod, and stoufferMethod.

To estimate the effect sizes of genes across all studies, first standardized mean difference for each gene in individual studies is compute. Next, the overall efect size and standard error are estimated using the random-efects model. This overall efect size represents the gene's expression change under the efect of the condition. The, z-scores and p-values of observing such efect sizes are computed. The p-values is obtained from classical hypothesis testing. By default, linear model and empirical Bayesian testing \(limma\) are used to compute the p-values for diferential expression. The two-tailed p-values are converted to one-tailed p-values (lef- and right-tailed). For each gene, the one-tailed p-values across all datasets are then combined using the addCLT, stouffer or fisher method. These p-values represent how likely the diferential expression is observed by chance.

A data.frame of gene statistics with following columns:

pTwoTails: Two-tailed p-values
pTwoTails.fdr: Two-tailed p-values with false discovery rate correction
pLeft: left-tailed p-values
pLeft.fdr: left-tailed p-values with false discovery rate correction
pRight.fdr: right-tailed p-values with false discovery rate correction
pRight: right-tailed p-values
ES: Effect size
ES.pTwoTails: Two-tailed p-values for effect size
ES.pTwoTails.fdr: Two-tailed p-values for effect size with false discovery rate correction
ES.pLeft: Left-tailed p-values for effect size
ES.pLeft.fdr: Left-tailed p-values for effect size with false discovery rate correction
ES.pRight: Right-tailed p-values for effect size
ES.pRight.fdr: Right-tailed p-values for effect size with false discovery rate correction

Tin Nguyen, Hung Nguyen, and Sorin Draghici

Nguyen, T., Shafi, A., Nguyen, T. M., Schissler, A. G., & Draghici, S. (2020). NBIA: a network-based integrative analysis framework-applied to pathway analysis. Scientific reports, 10(1), 1-11. Nguyen, T., Tagett, R., Donato, M., Mitrea, C., & Draghici, S. (2016). A novel bi-level meta-analysis approach: applied to biological pathway analysis. Bioinformatics, 32(3), 409-416. Smyth, G. K. (2005). Limma: linear models for microarray data. In Bioinformatics and computational biology solutions using R and Bioconductor (pp. 397-420). Springer, New York, NY.

addCLT

datasets <- c("GSE17054", "GSE57194", "GSE33223", "GSE42140")
data(list = datasets, package = "BLMA")
dataList <- lapply(datasets, function(dataset) {
    get(paste0("data_", dataset))
})
groupList <- lapply(datasets, function(dataset) {
    get(paste0("group_", dataset))
})
names(dataList) <- datasets
names(groupList) <- datasets

allGenes <- Reduce(intersect, lapply(dataList, rownames))

geneStat <- getStatistics(allGenes, dataList, groupList)
head(geneStat)

# perform pathway analysis
library(ROntoTools)
# get gene network
kpg <- loadKEGGPathways()$kpg
# get gene network name
kpn <- loadKEGGPathways()$kpn
# get geneset
gslist <- lapply(kpg,function(y) nodes(y))

# get differential expressed genes
DEGenes.Left <- rownames(geneStat)[geneStat$pLeft < 0.05 & geneStat$ES.pLeft < 0.05]
DEGenes.Right <- rownames(geneStat)[geneStat$pRight < 0.05 & geneStat$ES.pRight < 0.05]

DEGenes <- union(DEGenes.Left, DEGenes.Right)

# perform pathway analysis with ORA
oraRes <- lapply(gslist, function(gs){
    pORACalc(geneSet = gs, DEGenes = DEGenes, measuredGenes = rownames(geneStat))
})
oraRes <- data.frame(p.value = unlist(oraRes), pathway = names(oraRes))
rownames(oraRes)  <- kpn[rownames(oraRes)]

# print results
print(head(oraRes))

# perfrom pathway analysis with Pathway-Express from ROntoTools
ES <- geneStat[DEGenes, "ES"]
names(ES) <- DEGenes

peRes = pe(x = ES, graphs = kpg, ref = allGenes, nboot = 1000, seed=1)

peRes.Summary <- Summary(peRes, comb.pv.func = fisherMethod)
peRes.Summary[, ncol(peRes.Summary) + 1] <- rownames(peRes.Summary)
rownames(peRes.Summary) <- kpn[rownames(peRes.Summary)]
colnames(peRes.Summary)[ncol(peRes.Summary)] = "pathway"

# print results
print(head(peRes.Summary))