knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(eval = FALSE)
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=65),tidy=TRUE)

Introduction

The last decade of systems biology research has demonstrated that networks rather than individual genes govern the onset and progression of complex diseases. Meanwhile, real world complex networks usually exhibit hierarchical organization, in which nodes can be combined into groups that can be further combined into larger groups, and so on over multiple scales. Thus, identifying the hierarchical organization of a network becomes indispensable in complex disease studies. A traditional and useful method for revealing hierarchical architecture of network is hierarchical clustering, which groups data over a variety of scales by creating a hierarchical tree. However, hierarchical clustering has three major limitations.

To address these limitations, we developed the NetSAM (Network Seriation and Modularization) package which identifies the hierarchical modules from a network (network modularization) and find a suitable linear order for all leaves of the identified hierarchical organization (network seriation). NetSAM takes an edge-list representation of a weighted or unweighted network as an input and generates as files that can be used as an input for the one-dimensional network visualization tool NetGestalt (http://www.netgestalt.org) or other network analysis. NetSAM uses random walk distance-based hierarchical clustering to identify the hierarchical modules of the network and then uses the optimal leaf ordering (OLO) method to optimize the one-dimensional ordering of the genes in each module by minimizing the sum of the pair-wise random walk distance of adjacent genes in the ordering. The detailed description of the NetSAM method can be found in our published Nature Methods paper "NetGestalt: integrating multidimensional omics data over biological networks" (http://www.nature.com/nmeth/journal/v10/n7/full/nmeth.2517.html.

The NetSAM package can also generate correlation network (e.g. co-expression network) based on the input matrix data, perform seriation and modularization analysis for correlation network and calculate the associations between the sample features and modules or identify the associated GO terms for the modules.

Environment

NetSAM requires R version 3.0.0 or later, which can be downloaded from the website http://www.r-project.org. Because the seriation step requires pair-wise distance between all nodes, NetSAM is memory consuming. We recommend to use the 64 bit version of R to run the NetSAM. For networks with less than 10,000 nodes, we recommend to use a computer with at least 8GB memory. Using our computer with 2.7 GHz Intel Core i5 processor and 8GB 1333 MHz DDR3 memory, NetSAM took 402 seconds to analyze the HPRD network (http://www.hprd.org) with 9198 nodes. For networks with more than 10,000 nodes, a computer with at least 16GB memory is recommended. NetSAM package requires the following packages: igraph (>=0.6-1), seriation (>=1.0-6), WGCNA (>=1.34.0), doParallel (>=1.0.10), foreach (>=1.4.0), tools (>=3.0.0), biomaRt (>=2.18.0), GO.db (>=2.10.0), R2HTML (>=2.2.0) and survival (>=2.37-7), which can be installed as follows.

install.packages("igraph")
install.packages("seriation")
install.packages("WGCNA")
install.packages("snow")
install.packages("doSNOW")
install.packages("foreach")
source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")
biocLite("GO.db")
install.packages("R2HTML")
install.packages("survival")

Network Seriation and Modularization

After building up the basic environment mentioned above, the users can install the NetSAM package and use it to analyze networks.

library("NetSAM")
inputNetworkDir <- system.file("extdata","exampleNetwork.net",package="NetSAM")
outputFileName <- paste(getwd(),"/NetSAM",sep="")
result <- NetSAM(inputNetwork=inputNetworkDir, outputFileName=outputFileName, outputFormat="nsm",
edgeType="unweighted", map_to_genesymbol=FALSE, organism="hsapiens", idType="auto", minModule=0.003,
stepIte=FALSE, maxStep=4, moduleSigMethod="cutoff", modularityThr=0.2, ZRanNum=10,
PerRanNum=100, ranSig=0.05, edgeThr=(-1), nodeThr=(-1), nThreads=3)

Input

Output

If output format is "nsm", the function will output not only an "nsm" file but also a list object containing module information, gene order information and network information. If output format is "gmt", the function will output the "gmt" file and a matrix object containing the module and annotation information.

Network Analyzer

The NetAnalyzer function calculates the degree, clustering coefficient, betweeness and closeness centrality for each node and the shortest path distance for each pair of nodes. The function can also plot the distributions for these measurements.

library("NetSAM")
inputNetwork <- system.file("extdata","exampleNetwork.net",package="NetSAM") 
outputFileName <- paste(getwd(),"/NetSAM",sep="")
NetAnalyzer(inputNetwork,outputFileName,"unweighted")

Input

Output

The NetAnalyzer function will output two "txt" files and five "pdf" files. Two "txt" files contain degree, clustering coefficient, betweeness and closeness centrality for each node and the shortest path distance for each pair of nodes. Five "pdf" files are the distributions of these measurements.

mergeDuplicate

The mergeDuplicate function will merge the duplicate Ids in the matrix and return the matrix with unique Ids. This function can also used to merge the duplicate mapped Ids when transforming the Ids of data matrix to other Ids.

library("NetSAM")
inputMatDir <- system.file("extdata","exampleExpressionData_nonsymbol.cct",package="NetSAM")
inputMat <- read.table(inputMatDir,header=TRUE,sep="\t",stringsAsFactors=FALSE,check.names=FALSE)
mergedData <- mergeDuplicate(id=inputMat[,1],data=inputMat[,2:ncol(inputMat)],collapse_mode="maxSD")

Input

Output

The function returns a data matrix with unique Ids.

Mapping other ids to gene symbols

To perform enrichment analysis in NetGestalt, the gene ids in each module should be gene symbols. The mapToSymbol function can transform other ids from a gene list, network, matrix, sbt file or sct file to gene symbols.

library("NetSAM")
print("transform ids from a gene list to gene symbols...")
geneListDir <- system.file("extdata","exampleGeneList.txt",package="NetSAM")
geneList <- read.table(geneListDir,header=FALSE,sep="\t",stringsAsFactors=FALSE)
geneList <- as.vector(as.matrix(geneList))
geneList_symbol <- mapToSymbol(inputData=geneList, organism="hsapiens", inputType="genelist",idType="affy_hg_u133_plus_2")

print("transform ids in the input network to gene symbols...")
inputNetwork <- system.file("extdata","exampleNetwork_nonsymbol.net",package="NetSAM")
network_symbol <- mapToSymbol(inputData=inputNetwork,organism="hsapiens",inputType="network",idType="entrezgene",edgeType="unweighted")

print("transform ids in the input matrix to gene symbols...")
inputMatDir <- system.file("extdata","exampleExpressionData_nonsymbol.cct",package="NetSAM")
matrix_symbol <- mapToSymbol(inputData=inputMatDir,organism="hsapiens",inputType="matrix",idType="affy_hg_u133_plus_2",collapse_mode="maxSD")

print("transform ids in the sbt file to gene symbols...")
inputSBTDir <- system.file("extdata","exampleSBT.sbt",package="NetSAM")
sbt_symbol <- mapToSymbol(inputData= inputSBTDir,organism="hsapiens",inputType="sbt",idType="affy_hg_u133_plus_2")

print("transform ids in the sct file to gene symbols...")
inputSCTDir <- system.file("extdata","exampleSCT.sct",package="NetSAM")
sct_symbol <- mapToSymbol(inputData= inputSCTDir,organism="hsapiens",inputType="sct",idType="affy_hg_u133_plus_2",collapse_mode="min")

Input

Output

The function will output a object with transformed data. If the ids in the input data can not be transformed to gene symbols, the function will output NULL. If outputFileName is TRUE, the functionsaves the transformed data to a file.

Construction of correlation network

The MatNet function can be used to construct a correlation network based on the input matrix.

library("NetSAM")
inputMatDir <- system.file("extdata","exampleExpressionData.cct",package="NetSAM")
matNetwork <- MatNet(inputMat=inputMatDir, collapse_mode="maxSD", naPer=0.7, meanPer=0.8, varPer=0.8,
corrType="spearman", matNetMethod="rank", valueThr=0.6, rankBest=0.003, networkType="signed",
netFDRMethod="BH", netFDRThr=0.05, idNumThr=(-1), nThreads=3)

Input

Output

The function will output a matrix with two columns.

Construction of consensus network

To increase robustness against errors in data, a bootstrapping procedure is used to construct a consensus network.

library("NetSAM")
inputMatDir <- system.file("extdata","exampleExpressionData.cct",package="NetSAM")
data <- read.table(inputMatDir, header=TRUE, row.names=1, stringsAsFactors=FALSE)
net <- consensusNet(data=data, organism="hsapiens",bootstrapNum=10, naPer=0.5, meanPer=0.8,varPer=0.8,method="rank_unsig",value=3/1000,pth=1e-6, nMatNet=2, nThreads=4)

Input

Output

The function will output a matrix with two columns.

Test input data format

The testFileFormat function will test the format of the input data matrix and annotation data and return the standardized data matrix and sample annotation data.

library("NetSAM")
inputMatDir <- system.file("extdata","exampleExpressionData.cct",package="NetSAM")
sampleAnnDir <- system.file("extdata","sampleAnnotation.tsi",package="NetSAM")
formatedData <- testFileFormat(inputMat=inputMatDir,sampleAnn=sampleAnnDir,collapse_mode="maxSD")

Input

Output

If there is no format error, the function will return the standardized data matrix and sample annotation data. Otherwise, it will output the detailed sources of errors.

Identification of the associations between sample features and modules

The featureAssociation function can be used to calculate the associations between sample features in the input sample annotation data and the modules identified by NetSAM or MatSAM functions.

library("NetSAM")
inputMatDir <- system.file("extdata","exampleExpressionData.cct",package="NetSAM")
sampleAnnDir <- system.file("extdata","sampleAnnotation.tsi",package="NetSAM")
data(NetSAMOutput_Example)
outputHtmlFile <- paste(getwd(),"/featureAsso_HTML",sep="")
featureAsso <- featureAssociation(inputMat=inputMatDir, sampleAnn=sampleAnnDir, NetSAMOutput=netsam_output, outputHtmlFile=outputHtmlFile, CONMethod="spearman", CATMethod="kruskal", BINMethod="ranktest", fdrmethod="BH",pth=0.05,collapse_mode="maxSD")

Input

Output

The function will output a data.frame object and a HTML file to show the significant associations.

Identification of the associated GO terms for the modules

The GOAssociation function can be used to identify the associated GO terms for the modules identified by NetSAM or MatSAM functions.

library("NetSAM")
data(NetSAMOutput_Example)
outputHtmlFile <- paste(getwd(),"/GOAsso_HTML",sep="")
GOAsso <- GOAssociation(NetSAMOutput=netsam_output, outputHtmlFile=outputHtmlFile, organism="hsapiens", fdrmethod="BH", fdrth=0.05, topNum=5)

Input

Output

The function will output a data.frame object and a HTML file to show the associated GO terms for each module.

Identification of correlation modules

The MatSAM function can identify the hierarchical correlation modules.

library("NetSAM")
inputMatDir <- system.file("extdata","exampleExpressionData.cct",package="NetSAM")
sampleAnnDir <- system.file("extdata","sampleAnnotation.tsi",package="NetSAM")
outputFileName <- paste(getwd(),"/MatSAM",sep="")
matModule <- MatSAM(inputMat=inputMatDir, sampleAnn=sampleAnnDir,
                    outputFileName = outputFileName, outputFormat="msm", 
                    organism="hsapiens", map_to_symbol=FALSE, idType="auto", collapse_mode="maxSD", naPer=0.7, meanPer=0.8, varPer=0.8, 
corrType="spearman", matNetMethod="rank",
valueThr=0.6, rankBest=0.003, networkType="signed", netFDRMethod="BH", 
netFDRThr=0.05, minModule=0.003, stepIte=FALSE,
maxStep=4, moduleSigMethod="cutoff", modularityThr=0.2, ZRanNum=10, PerRanNum=100, ranSig=0.05, idNumThr=(-1), nThreads=3)

Input

The description of other arguments can be found in the argument description of the proceeding functions.

Output

The function will output a list object containing module information, gene order information, correlation network and filtered matrix based on the ids in the network. The function will also output two HTML files that contain the significant associations between sample features and modules identified by featureAssociation function and associated GO terms for the modules identified by GOAssociation function.

Note: When calling featureAssociation function, MatSAM uses the default parameters. When calling the GOAssociation function, MatSAM sets "ouputType" to "top" and "topNum" to $1$. User can use the list object returned by MatSAM as the input to these two functions to perform further analysis with different parameters.



bingzhang16/NetSAM documentation built on April 3, 2024, 3:35 a.m.