MatSAM | R Documentation |
The MatSAM function first uses MatNet function to identify the correlation network and then uses NetSAM function to identify the module and optimize the one-dimensional ordering of the nodes in each module.
MatSAM(inputMat, sampleAnn=NULL, outputFileName, outputFormat="msm", organism="hsapiens", map_to_symbol=FALSE, idType="auto", collapse_mode="maxSD", naPer=0.7, meanPer=0.8, varPer=0.8, corrType="spearman", matNetMethod="rank", valueThr=0.5, rankBest=0.003, networkType="signed", netFDRMethod="BH", netFDRThr=0.05, minModule=0.003, stepIte=FALSE, maxStep=4, moduleSigMethod="cutoff", modularityThr=0.2, ZRanNum=10, PerRanNum=100, ranSig=0.05, idNumThr=(-1), nThreads=3)
inputMat |
|
sampleAnn |
|
outputFileName |
Output file name. The file name extension is "msm" which can be uploaded to the NetGestalt directly. |
outputFormat |
The format of the output file. "msm" format can be used as an input in NetGestalt; "gmt" format can be used to do other network analysis (e.g. as an input in GSEA (Gene Set Enrichment Analysis) to do module enrichment analysis); "multiple" represents the MatSAM function will output five files: ruler file containing gene order information, hmi file containing module information, net file containing correlation network information, cct file containing the filtered data matrix, and tsi file containing the sample annotation with standardized format; and "none" represents the function will not output any file. |
organism |
The organism of the input data. Currently, the package supports the following nine organisms: hsapiens, mmusculus, rnorvegicus, drerio, celegans, scerevisiae, cfamiliaris, dmelanogaster and athaliana. The default is "hsapiens". |
map_to_symbol |
If |
idType |
The id type of the ids in the input matrix. MatSAM will use BiomaRt package to transform the input ids to gene symbols based on |
collapse_mode |
The method to collapse duplicate ids. "mean", "median", "maxSD", "maxIQR", "max" and "min" represent the mean, median, max standard deviation, max interquartile range, maximum and minimum of values for ids in each sample. The default is "maxSD". |
naPer |
To remove ids with missing values in most of samples, the function calculates the percentage of missing values in all samples for each id and removes ids with over |
meanPer |
To remove ids with low values, the function calculates the mean of values for a id in all samples and remains top |
varPer |
Based on the remained ids filtered by |
corrType |
A character string indicating which correlation coefficient is to be computed for each pair of ids. The function supports "spearman" (default) or "pearson" method. |
matNetMethod |
MatNet function supports three methods to construct correlation network: "value", "rank" and "directed". 1. "value" method: the correlation network only remains id pairs with correlations over cutoff threshold |
valueThr |
Correlation cutoff threshold for "value" method. The default is 0.5. |
rankBest |
The percentage of ids that are most similar to one id for "rank" method. The default is 0.003 which means the "rank" method will select top 30 most similar ids for each id if the number of ids in the matrix is 10,000. |
networkType |
If |
netFDRMethod |
p value adjustment methods for "rank" and "directed" methods. The default is "BH". |
netFDRThr |
fdr threshold for identifying significant pairs for "rank" and "directed" methods. The default is 0.05 |
minModule |
The minimum percentage of nodes in a module. The minimum size of a module is calculated by multiplying |
stepIte |
Because NetSAM uses random walk distance-based hierarchical clustering to reveal the hierarchical organization of an input network, it requires a specified length of the random walks. If |
maxStep |
The length or max length of the random walks. |
moduleSigMethod |
To test whether a network under consideration has a non-random internal modular organization, the function provides three options: "cutoff", "zscore" and "permutation". "cutoff" means if the modularity score of the network is above a specified cutoff value, the network will be considered to have internal organization and will be further partitioned. For "zscore" and "permutation", the function will first generate a set of random modularity scores. Based on a unweighted network, the function uses the edge switching method to generate a given number of random networks with the same number of nodes and an identical degree sequence and calculates the modularity scores for these random networks. Based on a weighted network, the function shuffles the weights of all edges and calculate the modularity scores for network with random weights. Then, "zscore" method will transform the real modularity score to a z score based on the random modularity scores and then transform the z score to a p value assuming a standard normal distribution. The "permutation" method will compare the real modularity score with the random ones to calculate a p value. Finally, under a specified significance level, the function determines whether the network can be further partitioned. The default is "cutoff". |
modularityThr |
Threshold of modularity score for the "cutoff" method. The default is 0.2 |
ZRanNum |
The number of random networks that will be generated for the "zscore" calculation. The default is 10. |
PerRanNum |
The number of random networks that will be generated for the "permutation" p value calculation. The default is 100. |
ranSig |
The significance level for determining whether a network has non-random internal modular organization for the "zscore" or "permutation" methods. The default is 0.05. |
idNumThr |
If the matrix contains too many ids, it will take a long time and use a lot of memory to identify the modules. Thus, the function provides the option to set the threshold of number of ids for further analysis. After filtering by meanPer and varPer, if the number of ids is still larger than |
nThreads |
MatSAM function supports parallel computing based on multiple cores. The default is 3. |
Including a "msm" file, the function will output a list object containing module information, gene order information, correlation network and filtered matrix based on the ids in the network. The function will also output two HTML files that contain the significant associations between sample features and modules and associated GO terms for the modules.
After identifying the modules, the MatSAM function will identify the associations between sample features and modules using the featureAssociation function or the associated GO terms for the modules using the GOAssociation function. For the featureAssociation function, MatSAM only uses the default parameters. For the GOAssociation function, MatSAM sets "outputType" as "top" and "topNum" as 1. The users can use the list object returned by MatSAM as the input of the function featureAssociation and GOAssociation to perform some further analysis based on the different parameters.
Jing Wang
MatNet
NetSAM
inputMatDir <- system.file("extdata","exampleExpressionData.cct",package="NetSAM")
cat(inputMatDir)
sampleAnnDir <- system.file("extdata","sampleAnnotation.tsi",package="NetSAM")
cat(sampleAnnDir)
outputFileName <- paste(getwd(),"/MatSAM",sep="")
matModule <- MatSAM(inputMat=inputMatDir, sampleAnn=sampleAnnDir, outputFileName=outputFileName, outputFormat="msm", organism="hsapiens", map_to_symbol=FALSE, idType="auto", collapse_mode="maxSD", naPer=0.7, meanPer=0.8, varPer=0.8, corrType="spearman", matNetMethod="rank", valueThr=0.6, rankBest=0.003, networkType="signed", netFDRMethod="BH", netFDRThr=0.05, minModule=0.003, stepIte=FALSE, maxStep=4, moduleSigMethod="cutoff", modularityThr=0.2, ZRanNum=10, PerRanNum=100, ranSig=0.05, idNumThr=(-1), nThreads=3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.