Introduction

This document is an introduction to co-expression networks. We will learn how to use co-expression networks (GCNs). A gene co-expression network, as we use them in CoExp, is a network of genes created with the WGCNA (Weighted Co-expression Networks) R package, see https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/ in combination with the km2gcn R package, see https://www.ncbi.nlm.nih.gov/pubmed/28403906.

This document has two purposes. The first one is to serve as a tutorial on how to create and annotate GCNs from gene expression data adequately cleaned for such purpose. The second one is to illustrate how to use the database of GCNs that come with the resource for research purposes. In the following section we explain what GCNs are awailable, how to use them and how we can create a GCN from scratch. The next to the following section explains how to clean data to get it ready for GCN creation. This resource has, in its present form, three main parts: R code, processed resources and meta-data.

Folders

This distribution can be found under the folder coexp. Under this folder we can find the following R files

Under coexp folder we can find the following folders

CoExp: repository of co-expression networks and gene expression profiling

What can we find at CoExp?

If we are at the Bioinformatics Master server, we have to do this, from the R console if we want to start using the software.

source("/home/biomaster/tia1718/tutorialesR/coexp/coexpression.R",chdir=T)
coexp.initDb("NoFull")

And now we are ready to start.

If you download it to your own computer, you have to edit gdp.coexp() function at coexpression.R file to update the route to the file coexpression.R. For example, if your route to the file is is "/home/alumno/tutorialesR/coexp/", then the function should be.

gdp.coexp = function(){
    return("/home/alumno/tutorialesR/coexp/")
}

And then we can start using the code from the R console by doing this

source("/home/alumno/tutorialesR/coexp/coexpression.R",chdir=T)
coexp.initDb("NoFull")

If you want to compile this documentation itself in your computer in order to check whether your installation was correct, then the rest of chunks of code for this documentation will be run in order for the documentation to be generated so please change the first line of the chunk above accordingly before compiling again.

source("~/Dropbox/clases/bioinf2017/tia1718/tutorialesR/coexp/coexpression.R",chdir=T)
coexp.initDb("NoFull")
for(data.family in names(coexp.data)){
    cat(paste0("Family of gene expression dataset ",data.family," available\n"))
    for(expr.data in names(coexp.data[[data.family]]))
        cat(paste0("Gene expression profiling dataset for tissue ",expr.data," available\n"))
}

In order to understand this we need to check the page https://www.synapse.org/#!Synapse:syn3219045 for information on the ROS/MAP study and how to access its data. This is a study on multi-omics data from frontal cortex samples of donors with various levels of severity of Alzheimer's disease.

Let's get the sample covariates so we can have a look at the values for the samples.

library(stringr)
#In ROS/MAP we have four networks, one for all the samples and three different 
#networks segreating by different levels of severity for cognitive decline 
#(not AD, probable AD and surely AD, AD stands for Alzheimer's disease)
names(coexp.nets[["rosmap"]])
#One of the covariates of this dataset is cognitive decline (cogdx) and it is value
#between 0 and 5 being 5 highest severity
covs = coexp.getCovariates(tissue="all",which.one="rosmap")
oldpar = par()
par(mfrow=c(2,2))
barplot(table(covs$gender),main="Gender in ROS/MAP")
hist(covs$age,main="Age in ROS/MAP")
barplot(table(covs$cogdx),main="Cognitive decline in ROS/MAP")
barplot(table(covs$braaksc),main="Braak stage in ROS/MAP")
par(oldpar)

In the same way, we can check that there is a co-expression network available for each of these datasets.

for(net.family in names(coexp.nets)){
    cat(paste0("Family of co-expression networks ",net.family," available\n"))
    for(net in names(coexp.nets[[net.family]]))
        cat(paste0("Co-expression network for tissue ",net," available\n"))
}

Accessing gene expression data

So, for example, if we wanted to access the gene expression profiling of cogdxad tissue, we can do

expr.data = coexp.getExprDataFromTissue(which.one="rosmap",tissue="cogdxad")
cat("The dataset has ",dim(expr.data)," samples and genes respectively\n")
cat("First gene names are",paste0(colnames(expr.data)[1:3],collapse=","),"\n")
cat("First sample IDs are",paste0(rownames(expr.data)[1:3],collapse=", "),"\n")

Note that gene IDs depend on the gene expression profiling family. For ROS/MAP all gene names are Ensembl IDs. We keep the familiy dependent sample ID too when possible.

We can convert between Ensembl IDs and gene symbols by using

source("naming.R")
coexp.fromEnsembl2GeneName(colnames(expr.data)[1:5])

And back to Ensembl IDs

coexp.fromGeneName2Ensembl(coexp.fromEnsembl2GeneName(colnames(expr.data)[1:5]))

Note that this is done by using internal files. The most reliable manner in which we can manipulate Ensembl IDs is going directly to biomart. But at the moment this falls outside of the scope of this tutorial.

Accessing network data and annotations

We can access a network and get its genes

net = coexp.getNetworkFromTissue(which.one="rosmap",tissue="cogdxad")
cat("The network has ",length(unique(net$moduleColors))," modules\n")
cat("The module size is\n")
print(table(net$moduleColors))
cat("The 1st gene's ID is ",names(net$moduleColors)[1]," and it belongs to module",
    net$moduleColors[1],"\n")

Or we can visually check their corresponding size and color as follows

coexp.plotModSizes(tissue="cogdxad",which.one="rosmap")

We can also access network properties by gene, e.g. the module membership of each gene within its module by using

source("coexpression.R")
expr.data = coexp.getExprDataFromTissue(which.one="rosmap",tissue="cogdxad")
coexp.getMM(which.one="rosmap",tissue="cogdxad",genes=colnames(expr.data)[1:5])

Or we can represent an histogram with all MM values for module yellow

source("naming.R")
mms = coexp.getMM(which.one="rosmap",tissue="cogdxad",
                  genes=coexp.getGenesFromModule(which.one="rosmap",
                                                 tissue="cogdxad",module="yellow"))
hist(mms,main="MM values for black module in ROS/MAP AD subjects GCN")

We can also get the top 5 Hub genes for that very same module

source("naming.R")
coexp.fromEnsembl2GeneName(names(mms[order(mms,decreasing=T)][1:5]))

We can also access gene information by modules. What are the modules available in the AD subjects GCN?

source("netdb.R")
coexp.getModulesFromTissue(which.one="rosmap",tissue="cogdxad")

Which are the genes within the tan module?

genes = coexp.getGenesFromModule(which.one="rosmap",tissue="cogdxad",
                                 module="tan")
cat("Tan module has",length(genes),"genes\n")

WGCNA has a nice features and this is that one can visually inspect module similarities in terms of gene expression by using the eigengene. An eigengene is the 1st PCA component of expression and we can construct a dendrogram and see how the eigengenes are organised within such graph. Modules hanging from the same root will be more similar than others far away in the dendrogram.

#We can get the eigengenes in the form of a matrix with samples in rows and columns for modules 
#There is one eigengene for each module
tissue = "cogdxad"
which.one = "rosmap"
egs = coexp.getNetworkEigengenes(which.one=which.one,tissue=tissue)
cat(tissue,"network was obtained from",nrow(egs),
    "samples and has",ncol(egs),"modules\n")
#And now we plot the EGs
coexp.plotEGClustering(which.one=which.one,tissue=tissue)

O we can simply plot its correlation

library(corrplot)
corrplot(cor(egs),tl.srt=45, 
         tl.col="black",
         type="upper",
         tl.cex=0.45,
         order="hclust",
              title=paste0("Eigengene correlations for ",tissue," within ",which.one),
            mar=c(0,3,3.5,2))

Note that the correlation plot will apply a clustering process too so the modules with highest correlation will appear together. There will be a correspondence between closeness at the dendrogramm and closeness at the triangular matrix (check for example thistle1, darkmagenta and greenyellow modules in both plots).

Finally, we can check the relation of modules with their covariates.

#We whole network has all samples and some modules will capture differences between
#genes in cases and controls, lets see the whole picture
covlist = colnames(coexp.getCovariates(tissue="all",which.one="rosmap"))
coexp.corWithCatTraits(which.one="rosmap",tissue="all",covlist=covlist)

So interesting module for understanding AD would be those that show any significant relation with assessed states of cognitive decline, e.g. darkmagenta, darkturquoise and royalblue. Note how PMI, age, batch, gender and race show no correlation whatsoever as they were applied for correction of the gene expression before creating the network.

Making sense of the networks

Thanks to annotation of the modules, we can obtain information on what the networks reflect in terms of the biology of the genes. There are two types of annotations. But both of them work at the module level. Given a network compound of modules $m_1, m_2, ..., m_n$, the unit of annotation is the module. In this regard, each module $m_i$ will be annotated from enrichment analyses against manually curated ontologies like the Gene Ontology, KEGG and REACTOME, but also with cell type enrichment by using both in-house built cell markers and those coming from WGCNA.

Let us use, in this section, the ROS/MAP "all" network.

net = coexp.getNetworkFromTissue(which.one="rosmap",tissue="all")
cat("Network built with",length(net$moduleColors),"genes and",nrow(net$MEs),"samples\n")
cat("The list of modules is",paste0(unique(net$moduleColors),collapse=", "),"\n")

Let us focus on the royalblue module. What is the enrichment of that module in terms of the biology of its genes?

source("geneannot.R")
report=coexp.reportOnModule(tissue="all",which.one="rosmap",module="turquoise",ctcollapse=F,how.many=30)

Now we can access the report's fields

names(report)

This is a comprehensive report on a module in which we can access a lot of information that can tell us about gene set function

#How many genes do we have in the module
report$size
#What are the top five annotations for that module coming from the Gene Ontology
report$go.report

The module hs 359 genes, and the annotation DB based enrichment has 7 significant terms. The field go.report gives us the top five most significant annotation DB based enrichment terms that show, at a glimpse, what the module genes do. In this case, this module is clearly doing something related to DNA repair and chromatin (terms are ordered by significance).

All in all, we can access all the DB based annotation separately as in

report=coexp.functionalReportOnModule(tissue="all",which.one="rosmap",module="turquoise")
dim(report)
names(report)
table(report$domain)

As we see, there are 26 terms, 18 coming from the Gene Ontology Biological Process database, 4 from KEGG and 4 from REACTOME pathway databases. So for example, in order to get information about REACTOME and KEGG terms

report[report$domain %in% c("rea","keg"),c("term.id","term.name","p.value")]

This data shows the corresponding REACTOME pathways that come up from the analysis, check gProfileR package documentation.

Finally, we can also generate a plot for the cell type

results = coexp.cellTypeByModule(tissue="all",which.one="rosmap",plot.file=NULL)


juanbot/CoExpNets documentation built on May 15, 2021, 12:20 p.m.