getStatistics: Intergrative genes statistic

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Calculate genes summary statistic across multiple datasets

Usage

1
getStatistics(allGenes, dataList, groupList, ncores = 1, method = addCLT)

Arguments

allGenes

Vector of all genes names for the analysis.

dataList

A list of expression matrices, in which rows are genes and columns are samples.

groupList

A list of vectors indicating sample group corresponding with expression matrices in dataList.

ncores

Number of core to use in prallel processing.

method

Function for combining p-values. It must accept one input which is a vector of p-values and return a combined p-value. Three methods are embeded in this package are addCLT, fisherMethod, and stoufferMethod.

Details

To estimate the effect sizes of genes across all studies, first standardized mean difference for each gene in individual studies is compute. Next, the overall efect size and standard error are estimated using the random-efects model. This overall efect size represents the gene's expression change under the efect of the condition. The, z-scores and p-values of observing such efect sizes are computed. The p-values is obtained from classical hypothesis testing. By default, linear model and empirical Bayesian testing \(limma\) are used to compute the p-values for diferential expression. The two-tailed p-values are converted to one-tailed p-values (lef- and right-tailed). For each gene, the one-tailed p-values across all datasets are then combined using the addCLT, stouffer or fisher method. These p-values represent how likely the diferential expression is observed by chance.

Value

A data.frame of gene statistics with following columns:

pTwoTails

Two-tailed p-values

pTwoTails.fdr

Two-tailed p-values with false discovery rate correction

pLeft

left-tailed p-values

pLeft.fdr

left-tailed p-values with false discovery rate correction

pRight.fdr

right-tailed p-values with false discovery rate correction

pRight

right-tailed p-values

ES

Effect size

ES.pTwoTails

Two-tailed p-values for effect size

ES.pTwoTails.fdr

Two-tailed p-values for effect size with false discovery rate correction

ES.pLeft

Left-tailed p-values for effect size

ES.pLeft.fdr

Left-tailed p-values for effect size with false discovery rate correction

ES.pRight

Right-tailed p-values for effect size

ES.pRight.fdr

Right-tailed p-values for effect size with false discovery rate correction

Author(s)

Tin Nguyen, Hung Nguyen, and Sorin Draghici

References

Nguyen, T., Shafi, A., Nguyen, T. M., Schissler, A. G., & Draghici, S. (2020). NBIA: a network-based integrative analysis framework-applied to pathway analysis. Scientific reports, 10(1), 1-11. Nguyen, T., Tagett, R., Donato, M., Mitrea, C., & Draghici, S. (2016). A novel bi-level meta-analysis approach: applied to biological pathway analysis. Bioinformatics, 32(3), 409-416. Smyth, G. K. (2005). Limma: linear models for microarray data. In Bioinformatics and computational biology solutions using R and Bioconductor (pp. 397-420). Springer, New York, NY.

See Also

addCLT

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
datasets <- c("GSE17054", "GSE57194", "GSE33223", "GSE42140")
data(list = datasets, package = "BLMA")
dataList <- lapply(datasets, function(dataset) {
    get(paste0("data_", dataset))
})
groupList <- lapply(datasets, function(dataset) {
    get(paste0("group_", dataset))
})
names(dataList) <- datasets
names(groupList) <- datasets

allGenes <- Reduce(intersect, lapply(dataList, rownames))

geneStat <- getStatistics(allGenes, dataList, groupList)
head(geneStat)

# perform pathway analysis
library(ROntoTools)
# get gene network
kpg <- loadKEGGPathways()$kpg
# get gene network name
kpn <- loadKEGGPathways()$kpn
# get geneset
gslist <- lapply(kpg,function(y) nodes(y))

# get differential expressed genes
DEGenes.Left <- rownames(geneStat)[geneStat$pLeft < 0.05 & geneStat$ES.pLeft < 0.05]
DEGenes.Right <- rownames(geneStat)[geneStat$pRight < 0.05 & geneStat$ES.pRight < 0.05]

DEGenes <- union(DEGenes.Left, DEGenes.Right)

# perform pathway analysis with ORA
oraRes <- lapply(gslist, function(gs){
    pORACalc(geneSet = gs, DEGenes = DEGenes, measuredGenes = rownames(geneStat))
})
oraRes <- data.frame(p.value = unlist(oraRes), pathway = names(oraRes))
rownames(oraRes)  <- kpn[rownames(oraRes)]

# print results
print(head(oraRes))

# perfrom pathway analysis with Pathway-Express from ROntoTools
ES <- geneStat[DEGenes, "ES"]
names(ES) <- DEGenes

peRes = pe(x = ES, graphs = kpg, ref = allGenes, nboot = 1000, seed=1)

peRes.Summary <- Summary(peRes, comb.pv.func = fisherMethod)
peRes.Summary[, ncol(peRes.Summary) + 1] <- rownames(peRes.Summary)
rownames(peRes.Summary) <- kpn[rownames(peRes.Summary)]
colnames(peRes.Summary)[ncol(peRes.Summary)] = "pathway"

# print results
print(head(peRes.Summary))

BLMA documentation built on Nov. 8, 2020, 8:15 p.m.