library(knitr)

Package:
MetaGSCA 0.99.0

Authors:
Hui Yu HuiYu1@salud.unm.edu
Haocan Song haocan.song@vumc.org
Yan Guo YaGuo@salud.unm.edu
Fei Ye fei.ye@vumc.org

Introduction

This tutorial provides an overview and a tutorial of the R package MetaGSCA for Meta Gene Set Co-Expression Analysis.

MetaGSCA (Meta Gene Set Co-expression Analysis) is a computational tool to systematically assess the co-expression disturbance of a specified gene set with rigorous meta-analysis justification. A nonparametric approach that accounts for the correlation structure between genes is used to test whether the gene set is differentially co-expresssed between the two conditions, and a permutation test p-value is computed. Bootstrapping is used to construct confidence intervals of point estimates. A meta-analysis of single proportions is then performed with two options: random-intercept logistic regression model and the inverse variance method, to yield conclusive results over a number of individual studies. The meta-analysis empowered gene set differential co-expression tool MetaGSCA is implemented in the form of both a web application and an R package.

Packages data.table, meta(metafor), grid, sjstats, igraph are required.

The analysis will start by loading package MetaGSCA.

library(MetaGSCA)


How to use the MetaGSCA package

This Section illustrates the typical procedure for applying the methods available in package MetaGSCA.

Set Working Directory

Create a temporary directory "workdir" under the user's current working directory. All output generated in this tutorial will be saved in "workdir".

## set your workdir
workdir <- setwd()


Example Data

Load the example data:
1. lists of genesets (29 pathways)
2. lists of datasets (KIRP & LIHC)
The 2 source of data are necessary for running the examples and case studies in this tutorial.

library(MetaGSCA)
###########################
## step 1, read in pathways
mygeneset <- list.files(system.file("extdata/pathway29", package = "MetaGSCA"),
                        pattern = "*.lst", full.names=TRUE)
genesets <- list(); genesetnames <- list()
for (i in 1:length(mygeneset)) {
    g <- fread(mygeneset[i], header = FALSE, data.table = FALSE)
    genesets[[i]] <- as.character(g[, 1])
    genesetnames[[i]] <- tools::file_path_sans_ext(mygeneset[i])
}
genesetnames <- gsub(paste(system.file("extdata/pathway29", package = "MetaGSCA"),
                     "/", sep = ""), "", genesetnames)

##########################
## step 2,read in datasets
mydataset <- c(paste(system.file("extdata", package = "MetaGSCA"),
                     "/KIRP.rds", sep=""),
               paste(system.file("extdata", package = "MetaGSCA"),
                     "/LIHC.rds", sep=""))
datasets <- list(); groups <- list(); datasetnames <- list()
for (j in 1:length(mydataset)) {
    d <- readRDS(file = mydataset[j])
    datasets[[j]] <- d[-1,]
    groups[[j]] <- as.character(d[1, -1])
    datasetnames[[j]] <- tools::file_path_sans_ext(mydataset[j])
}
datasetnames <- gsub(paste(system.file("extdata", package = "MetaGSCA"),
                     "/", sep = ""), "", datasetnames)

Note:
1. lists of genesets (29 pathways):

genesetnames

2. lists of datasets (KIRP & LIHC):
KIRP (3819 rows & 65 columns) contain 3818 genes & 32 individuals, individuals are equally distributed in 2 groups (group1 & group2).
LIHC (3687 rows & 101 columns) contain 3686 genes & 50 individuals, individuals are equally distributed in 2 groups (group1 & group2).


Case Study

Example 1: MetaGSCA

Code

MetaGSCA is used to perform meta & gene set co-expression analysis of one geneset.

Arguments
list.geneset: a pre-specified gene list from a gene set or pathway.
list.dataset: a list of datasets, first column is gene name.
list.group: a list of samples/patients subgroup or condition (e.g. (1,1,1,2,2,2)).
name.geneset: the name of the gene set, used for output file name.
name.dataset: a list of dataset names corresponding to list.dataset, used for forest plot.
nperm: number of permutations used to estimate the null distribution of the test statistic. If not given, a default value 500 is used.
nboot: number of bootstraps used to estimate the point and interval estimate. If not given, a default value 200 is used.
method.Inverse: logical. Indicating whether "Inverse" method is to be used for pooling of studies.
method.GLMM: logical. Indicating whether "GLMM" method is to be used for pooling of studies.
fixed.effect: logical. Indicating whether a fixed effect meta-analysis should be conducted.
* random.effect: logical. Indicating whether a random effects meta-analysis should be conducted.

## step 3.1,call MetaGSCA function to perform Meta-analysis
# dir.create(file.path(workdir, "result_single_geneset"), showWarnings = FALSE)
# setwd(file.path(workdir, "result_single_geneset"))
MetaGSCA(list.geneset = genesets[[2]],
         list.dataset = datasets,
         list.group = groups,
         name.geneset = genesetnames[[2]],
         name.dataset = datasetnames,
         nperm = 100,
         nboot = 100,
         method.Inverse = FALSE,
         method.GLMM = TRUE,
         fixed.effect = FALSE,
         random.effect = TRUE)


Result

genesetnames[2]
genesets[2]


There'll be 2 output file after MetaGSCA. 1 forest plot (.pdf) and 1 output data information file (.csv).

result <- paste(system.file("extdata", package = "MetaGSCA"), "/single.png", sep="")
knitr::include_graphics(result)


In the first file, we're able to get the forest plot of 2 datasets (KIRP & LIHC), with random effects model result. Attached with bootstrap result at the bottom side.

result1 <- paste(system.file("extdata", package = "MetaGSCA"), "/Aminosugars metabolism_GLMM_Meta Analysis for bootstrap p-value_Forest plot.png", sep="")
knitr::include_graphics(result1)


In the second file, it includes the output data information of 2 datasets (KIRP & LIHC).

Gene[2]: Aminosugars metabolism_Individual Dataset GSAR Results with Bootstrapping

result2 <- paste(system.file("extdata", package = "MetaGSCA"), "/Aminosugars metabolism_Individual Dataset GSAR Results with Bootstrapping.csv", sep="")
t(read.csv(result2))


Example 2: MetaGSCA_Multi_Geneset

Code

MetaGSCA_Multi_Geneset is used to perform meta & gene set co-expression analysis of multiple geneset.

Arguments
list.geneset: several pre-specified gene lists from a gene set or pathway.
name.geneset: the names of the gene sets, used for output file name.

## step 3.2,call MetaGSCA_Multi_Geneset function to perform Multiple geneset Meta-analysis
# dir.create(file.path(workdir, "result_multiple_geneset"), showWarnings = FALSE)
# setwd(file.path(workdir, "result_multiple_geneset"))
MetaGSCA_Multi_Geneset(list.geneset = genesets[2:3],
                       list.dataset = datasets,
                       list.group = groups,
                       name.geneset = genesetnames[2:3],
                       name.dataset = datasetnames,
                       nperm = 100,
                       nboot = 100,
                       method.Inverse = FALSE,
                       method.GLMM = TRUE,
                       fixed.effect = FALSE,
                       random.effect = TRUE)


Result

genesetnames[2:3]
genesets[2:3]


There'll be 4 output file after MetaGSCA. 2 forest plot (.pdf) and 2 output data information file (.csv).

result <- paste(system.file("extdata", package = "MetaGSCA"), "/multiple.png", sep="")
knitr::include_graphics(result)


In the 2 forest plot files, we're able to get the forest plot of 2 datasets (KIRP & LIHC), with random effects model result. Attached with bootstrap result at the bottom side.

result1 <- paste(system.file("extdata", package = "MetaGSCA"), "/Aminosugars metabolism_GLMM_Meta Analysis for bootstrap p-value_Forest plot.png", sep="")
knitr::include_graphics(result1)


result1 <- paste(system.file("extdata", package = "MetaGSCA"), "/Arginine and Proline metabolism_GLMM_Meta Analysis for bootstrap p-value_Forest plot.png", sep="")
knitr::include_graphics(result1)


In the 2 output data information files, they includes the output data information of 2 datasets (KIRP & LIHC).

Gene[2]: Aminosugars metabolism_Individual Dataset GSAR Results with Bootstrapping

result2 <- paste(system.file("extdata", package = "MetaGSCA"), "/Aminosugars metabolism_Individual Dataset GSAR Results with Bootstrapping.csv", sep="")
t(read.csv(result2))


Gene[3]: Arginine and Proline metabolism_Individual Dataset GSAR Results with Bootstrapping

result2 <- paste(system.file("extdata", package = "MetaGSCA"), "/Arginine and Proline metabolism_Individual Dataset GSAR Results with Bootstrapping.csv", sep="")
t(read.csv(result2))


Warning Result given by 2.3.1 & 2.3.2

There might be a warning message if we perform GLMM method (method.GLMM = TRUE). If the oringinal p-values have the same value, GLMM method (function:metaprop) will automatically stop with an ERROR message, and bootstrap procedure couldn't be done. To deal with this problem, we set the oringinal p-value as the mean of different datasets p-value, with bootstrap results.

warn <- paste(system.file("extdata", package = "MetaGSCA"), "/GLMM warning.png", sep="")
knitr::include_graphics(warn)



Example 3: Pathway Crosstalk Analysis

For pathway crosstalk analysis, one call PWcTalk function in one line, or alternatively, execute several codelines to enable greater modulation of network layout. One must execute MetaGSCA_Multi_Geneset function on sufficiently many gene sets (say >100) before attempting pathway crosstalk analysis. To enable swift testing of pathway cross talk analysis, we provide one example input as if it was generated from the prior workflow.

data(input2PWcTalk)
## step 4.1, one code line to execute pathway crosstalk analysis. Simple, but no flexibility for layout tuning. 
PWcTalk(input2PWcTalk,test='binary',
  pTh.dataset=0.01,pTh.pwPair=0.01,pTh.pw=0.01,figname='PWcTalk',
  pdfW=10,pdfH=10,asp=0.7,vbase=15,ebase=2,vlbase=1,power=1/2)
## step 4.2, one code block to execute pathway crosstalk analysis. More code lines, but enabling interactive layout tuning.
preNW <- PWcTalkNWpre(input2PWcTalk,test='binary',
  pTh.dataset=0.01,pTh.pwPair=0.01,pTh.pw=0.01)
g_tkid <- PWcTalkNW(preNW$PW.pair,preNW$PW.p)
##### PAUSE here: adjust the network layout on the pop-out window to reach a satisfaction #####
coords <- tk_coords(g_tkid$tkid)
g_tkid <- PWcTalkNW(preNW$PW.pair,preNW$PW.p,layout=coords,pdfW=14,pdfH=10,figname='PWcTalk',asp=0.5)       
result1 <- paste(system.file("extdata", package = "MetaGSCA"), "/PWcTalk.png", sep="")
knitr::include_graphics(result1)


Citing MetaGSCA

If you use MetaGSCA in your publication, please cite: XXXXXX



Haocan223/MetaGSCA documentation built on Nov. 19, 2020, 4:34 a.m.