Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/prepareAdjMat.R
Read the network information from any of the graphite databases specified by the user and construct the adjacency matrices needed for NetGSA. This function also allows for clustering. See details for more information
| 1 2 | 
| x | The p x n data matrix with rows referring to genes and columns to samples. Row names should be unique and have gene ID types appended to them. The id and gene number must be separated by a colon. E.g. "ENTREZID:127550" | 
| group | Vector of class indicators of length n. Identifies the condition for each of the n samples | 
| databases | (Optional) Either (1) the result of a call to  | 
| cluster | (Optional) Logical indicating whether or not to cluster genes to estimate adjacency matrix. If not specified, set to TRUE if there are > 2,500 genes (p > 2,500). The main use of clustering is to speed up calculation time. If the dimension of the problem, or equivalently the total number of unique genes across all pathways, is large,  If clustering is set to TRUE, the 0-1 adjacency matrix is used to detect clusters of genes within the connected components. Once gene clusterings are chosen, the weighted adjacency matrices are estimated for each cluster separately using  If clustering is set to FALSE, the 0-1 adjacency matrix is used to detect connected components and the weighted adjacency matrices are estimated for each connected component. Singleton clusters are combined into one cluster. This should not affect performance much since the gene in a singleton cluster should not have any edges to other genes. | 
| file_e | (Optional) The name of the file which the list of edges is to read from. This file is read in with  
 This information cannot conflict with the user specified non-edges. That is, one cannot have the same edge in  | 
| file_ne | (Optional) The name of the file which the list of non-edges is to read from. This file is read in with  In the case of conflicting information between  | 
| lambda_c | (Non-negative) a vector or constant.  | 
| penalize_diag | Logical. Whether or not to penalize diagonal entries when estimating weighted adjacency matrix. If TRUE a small penalty is used, otherwise no penalty is used. | 
| eta | (Non-negative) a small constant needed for estimating the edge weights. By default,  | 
The function prepareAdjMat accepts both network information from user specified sources as well as a list of graphite databases to search for edges in. prepareAdjMat calculates the 0-1 adjacency matrices and runs netEst.undir or netEst.dir if the graph is undirected or directed. 
When searching for network information, prepareAdjMat makes some important assumptions about edges and non-edges. As already stated, the first is that in the case of conflicting information, user specified non-edges are given precedence. 
prepareAdjMat uses obtainEdgeList to standardize and search the graphite databases for edges. For more information see ?obtainEdgeList. prepareAdjMat also uses database information to identify non-edges. If two genes are identified in the databases edges but there is no edge between them this will be coded as a non-edge. The rationale is that if there was an edge between these two genes it would be present.
prepareAdjMat assumes no information about genes not identified in databases edgelists. That is, if the user passes gene A, but gene A is not found in any of the edges in databases no information about Gene A is assumed. Gene A will have neither edges nor non-edges.
Once all the network and clustering information has been compiled, prepareAdjMat estimates the network. prepareAdjMat will automatically detect directed graphs, rearrange them to the correct order and use netEst.dir to estimate the network. When the graph is undirected netEst.undir will be used. For more information on these methods see ?netEst.dir and ?netEst.undir.
Importantly, prepareAdjMat returns the list of weighted adjacency matrices to be used as an input in NetGSA.
A list with components
| Adj | A list of weighted adjacency matrices estimated from either  | 
| invcov | A list of inverse covariance matrices estimated from either  | 
| lambda | A list of values of tuning parameters used for each condition in  | 
Michael Hellstern
Ma, J., Shojaie, A. & Michailidis, G. (2016) Network-based pathway enrichment analysis with incomplete network information. Bioinformatics 32(20):165–3174.
NetGSA, netEst.dir, netEst.undir
| 1 2 3 4 5 6 7 8 9 | ## load the data
data("breastcancer2012")
## consider genes from the "ErbB signaling pathway" and "Jak-STAT signaling pathway"
genenames    <- unique(c(pathways[[24]], pathways[[52]]))
sx           <- x[match(rownames(x), genenames, nomatch = 0L) > 0L,]
adj_cluster    <- prepareAdjMat(sx, group, databases = c("kegg", "reactome", "biocarta"), cluster = TRUE)
adj_no_cluster <- prepareAdjMat(sx, group, databases = c("kegg", "reactome", "biocarta"), cluster = FALSE)
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.