library(knitr)

Package: MetaGSCA 1.0.0

Authors:
Yan Guo YaGuo@salud.unm.edu
Fei Ye fei.ye@vumc.org
Hui Yu schena.yu@gmail.com

Introduction

MetaGSCA is an analytical tool to systematically assess the co-expression of a specified gene set by aggregating evidence across studies. A nonparametric approach (GSNCA) that accounts for the gene-gene correlation structure is used to test whether the gene set is differentially co-expressed between two comparative conditions, from which a permutation test p-statistic is computed. A meta-analysis is then performed with one of two options: random-intercept logistic regression model or the inverse variance method, thus yielding a con-clusive result across individual studies. Based on the results from the meta-analysis, a pathway crosstalk network can be delineated to visually reflect represent pathways’ similar profiles of gene set co-expression across the aligned datasets.

In the manuscript, MetaGSCA was demonstrated on over 300 cellular pathways across 11 TCGA datasets. In the present tutorial, we demonstrate the usage of main functions on two pathways across 11 TCGA datasets.

Packages data.table, meta (metafor), grid, sjstats, igraph are required.

The analysis will start by loading package MetaGSCA.

library(MetaGSCA)


How to use the MetaGSCA package

This Section illustrates the typical procedure for applying the main functions provided in package MetaGSCA.

Set Working Directory

Create a temporary directory "tutorial" under the user's current working directory. All output generated in this tutorial will be saved in "tutorial".

## set your workdir
if (!dir.exists('tutorial')) dir.create('tutorial')
setwd('tutorial')


Example Data

Load the example data:
1. lists of genesets (3 pathways)
2. lists of datasets (11 TCGA datasets)
Two aspects of datasets are necessary for running the examples and case studies in this tutorial.

data(meta)
datasetnames <- names(datasets)
genesetnames <- names(genesets)

Note:
1. lists of genesets (3 pathways):

genesetnames

2. lists of datasets (11 TCGA datasets):

tmp <- sapply(datasets,dim) 
rownames(tmp) <- c('#Genes','#Samples')
tmp



Usage of Major Functions

Meta-analysis of gene set differential correlation (MetaGSCA function)

MetaGSCA() function performs meta-analysis of gene set differential co-expression analysis for one or multiple genesets. When dealing with only one gene set, users supply one item for each of the two arguments: list.geneset and names.geneset; when dealing with multiple gene sets, users supply multiple items for each of these two arguments. For simplicity, here we demonstrate the usage on two gene sets.

Arguments
list.geneset: a list of gene sets (one or multiple).
list.dataset: a list of datasets, first column is gene name.
list.group: a list of samples/patients subgroup or condition (e.g. (1,1,1,2,2,2)).
names.geneset: gene set names, corresponding to list.geneset.
names.dataset: dataset names, corresponding to list.dataset.
nperm: number of permutations used to estimate the null distribution of the test statistic. If not given, a default value 500 is used.
nboot: number of bootstraps used to estimate the point and interval estimate. If not given, a default value 200 is used.
method: meta-analysis method. Must be either GLMM or Inverse.
effect: statistical model applied in meta-analysis. Must be either random or fixed*.

Code

testMultiple <- MetaGSCA(list.geneset = genesets[1:2],
                       list.dataset = datasets,
                       list.group = groups,
                       names.geneset = genesetnames[1:2],
                       names.dataset = datasetnames,
                       nperm = 100,
                       nboot = 100,
                       method = 'GLMM',
                       effect = 'random')


Parameters

genesetnames[1:2]
genesets[1:2]


Result

In the working directory, two output files will be generated for each gene set: a forest plot (.png) and a table file (.csv). After the command execution, four output files will be generated at the gene-set level, including two forest plots (.png) and two table files (.csv). In addition to these gene-set level files, an overall TSV file will be generated, which assembles output statistics information from multiple gene sets.

sort(dir(pattern='csv|png'))


In each gene-set-specific PNG file, we see the forest plot across multiple datasets, with random effects model bootstrap result attached at the bottom.

result1 <- grep('png',dir(pattern=genesetnames[1]),value=T)


multiFig1{width=150%}
Forest plot for one gene set across multiple datasets

result1 <- grep('png',dir(pattern=genesetnames[2]),value=T)


multiFig2{width=150%}
Forest plot for another gene set across multiple datasets

The two CSV files each include the output data information for one specific gene set across multiple datasets. Here we only show information for the first two datasets.

r genesetnames[1]

result2 <- grep('csv',dir(pattern=genesetnames[1]),value=T)
t(read.csv(result2,row.names=1))[,1:2]


r genesetnames[2]

result2 <- grep('csv',dir(pattern=genesetnames[2]),value=T)
t(read.csv(result2,row.names=1))[,1:2]


At last, the TSV file _Meta Analysis bootstrap result.tsv assembles the statistics results of all gene sets altogether, including information for all individual datasets.

_Meta Analysis bootstrap result.tsv
Geneset.Name: Gene set (pathway) name.
Number.of.Genes: Number of genes included in the pathway.
meta.p: Meta-analysis summarized permutation p-value across multiple datasets.
bootstrap.p: Median of meta-analysis summarized bootstrap p-values across 11 datasets.
* BRCA: Permutation p-value generated within one individual dataset (BRCA here, but can be other dataset name).

tmp <- read.delim('_Meta Analysis bootstrap result.tsv',as.is=T,row.names=1)
round(t(tmp),4)

Code for single gene set application

Users can try executing MetaGSCA() function on one gene set only. Arguments list.geneset and names.geneset should be prepared in the same way as if in case of multiple gene sets, only that one item is involved. When executing the following command, one overall text file (_Meta Analysis bootstrap result.tsv) and two output files (.png and .csv) for one gene set will be generated.

testSingle <- MetaGSCA(list.geneset = genesets[2],
         list.dataset = datasets, 
         list.group = groups,
         names.geneset = genesetnames[2],
         names.dataset = datasetnames,
         nperm = 100,
         nboot = 100,
         method = 'GLMM',
         effect = 'random')



Pathway Crosstalk Analysis (PWcTalk function)

For pathway crosstalk analysis, one call PWcTalk() function in one line, or alternatively, execute several codelines combining PWcTalkpre() and PWcTalkNW() to enable greater modulation of network layout. One must execute MetaGSCA() function on sufficiently many gene sets (say >100) before attempting pathway crosstalk analysis. Please type ?PWcTalk to access example commandlines centered upon the example inputdata object input2PWcTalk. Below is the output figure generated from executing those example commandlines.

Note: If you are connecting from a Windows desktop to a remote Linux server and run these commands in Linux terminal, you will need an X11 display assistance (such as Xming) configured and started before you run pathway crosstalk analysis.

result1 <- 'PWcTalk.png' #paste(system.file("extdata", package = "MetaGSCA"), "/../vignette/PWcTalk.png", sep="")
knitr::include_graphics(result1)


Auxiliary functions

Gene Sets Net Correlations Analysis (GSNCA related functions)

For users convenience, we provide several functions pertinent to the core differential co-expression analysis algorithm, GSNCA. Function gsnca_stat() calculates the overall correlation distance metric for a gene set between two experimental conditions, and thus it returns a scalar value. Function gsnca_p() implements the sample label permutation and calculates the p-statistic based on a number of permutated trials; this function returns a list of two scalar components, one for the correlation distance statistic and the other for the p-statistic. Function gsnca_gsets() conducts GSNCA (including permutation) for multiple gene sets, and its list output has the external layer for the multiple gene sets and the internal layer of the same format as gsnca_p() output.

data(meta)
BRCA <- datasets[['BRCA']]
N <- ncol(BRCA)
n1 <- N%/%2
objt1 <- t(BRCA[1:min(66,nrow(BRCA)),1:n1])
objt2 <- t(BRCA[1:min(66,nrow(BRCA)),(n1+1):N])
gStat.res <- gsnca_stat(objt1,objt2)
# gsnca_stat() result
gStat.res
data(meta)
BRCA <- datasets[['BRCA']]
smpCode <- substr(colnames(BRCA),14,15)
grp1 <- which(smpCode=='01')
grp2 <- which(smpCode=='11')
object <- BRCA[1:min(66,nrow(BRCA)),c(grp1,grp2)]
group <- c(rep(1,length(grp1)),rep(2,length(grp1)))
perm.list <- vector('list',500)
for (i in seq_len(500)) {perm.list[[i]] <- sample(ncol(object))}
gsnca_p.res <- gsnca_p(object,group,perm.list)
gsets <- split(rownames(object),rep(1:2,each=nrow(object)%/%2)) 
gsnca_gsets.res <- gsnca_gsets(gsets,object,group,perm.list)
# gsnca_p() result
gsnca_p.res
# gsnca_gsets() result
gsnca_gsets.res[1:2]


Scrutiny of gene-wise standard deviation (check_sd function)

Differential co-expression analysis is heavily dependent on the variance of data. If many genes have static expression across samples in either or both experimental conditions, GSNCA or alike algorithms may fail. The main function MetaGSCA() calls on a function check_sd() to pre-check the variance of genes in the input data matrix. Here we make this function check_sd() accessible for users so that they may use it to check on the variance situation of the input data themselves. Input to the function include two data matrices of same number of columns (genes) for two comparative conditions respectively, one vector for gene names (colnames of the two matrices), and a threshold of minimum standard deviation (defaults to 0.001). The output is a list of two components: one vector of gene names for the variance-qualifying genes, and a concatenated string including all disqualifying genes.

data(meta)
STAD <- datasets[['STAD']]
N <- ncol(STAD)
n1 <- N%/%2
objt1 <- t(STAD[,1:n1])
objt2 <- t(STAD[,(n1+1):N])
genes <- rownames(STAD)
check.res <- check_sd(objt1,objt2,genes,0.1)
head(check.res$genes.kept)
# check.res$genes.removed # three genes were removed.
check.res$genes.removed


Citing MetaGSCA

Guo Y, Yu H, ..., Ye Fei. MetaGSCA: A tool for meta-analysis of gene set differential coexpression. PLoS computational biology. 2021;17(5):e1008976.



hui-sheen/MetaGSCA documentation built on April 9, 2022, 7:24 p.m.