From Data Matrix to Adjacency Matrix
Description
mat2adj
is a high level function providing
different network inference methods. The function takes in input a data
matrix N by P, with N samples on the rows and P variables on the
columns. The adjacency matrix P by P will be computed with the
specified method, using N samples to infer the interactions between
the variables.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 
Arguments
x 
a matrix or data.frame of numerical values of N rows and P columns 
method 
a character string indicating which
method will be used for inferring a relationship between two variables. This must
be (an abbreviation of) one of 
P 
6 (default), integer used as softthresholding power for network
construction, used by the 
FDR 
1e3 (default), a number which indicates the number of
values generated to compute the NULL hypothesis. To be used for
methods 
measure 

alpha 
0.6 (default), the 
C 
15 (default), an integer value to be passed at the mine function
main. Only for methods 
DP 
1 (default), only for method 
... 
Additional arguments to be passed to the downstream functions. Normally the argument passed through ... are processed by the functions which compute the inference. Not all parameters are used by all functions. 
Details
mat2adj
function is a highlevel function which includes
different methods for network inference. In particular the function
infer the relation between all the possible pairwaise comparison
between columns in the dataset. If the input is a data.frame
,
columns were first converted into a numerical matrix. Given a N by P
numerical matrix, the relation between each PxP pairs of
variables is inferred with the selected method.
The "FDR"
corrected methods are based on a permutation estimate
of the null hypothesis. A total amount of 1/("FDR"
)
permutations are performed to asses the reliability of the inferred
link; each link is set only if it
is inferred in all the permutations and its weight is lower then the
value on non permuted data. The default value for FDR
is 1e3.
All the available methods are the following:
cor
(default) computes the interaction using the 'Pearson' correlation coefficient. Different correlation methods, such as
Spearman
could be passed to the function using ....ARACNE
Algorithm for the Reconstruction of Gene Regulatory Networks, see also package minet
CLR
Context Likelihood of Relatedness see also package minet
WGCNA
WeiGhted Correlation Network Analsysis. It is based on a correlation measure. For further details see the documentation of WGCNA package. The method accept parameter
P
which is set to 6 by defaultbicor
Biweighted Correlation method. It uses a biweighted correlation as described in bicor package
TOM
Topological Overlap Measure inference method. For further details see the documentation of WGCNA package. As for
WGCNA
the parameterP
can be set(6 by default).MINE
Maximum Informationbased Nonparametric Exploration. This method uses the minerva implementation of the original measure. For this methods different measures are available. See minerva for further information. To clarify the main MINE family statistics let D={(x,y)} be the set of n ordered pairs of elements of
x
andy
. The data space is partitioned in an XbyY grid, grouping the x and y values in X and Y bins respectively.
The value ofalpha
(default 0.6) has been empirically chosen by the authors of the original paper.alpha is the exponent of the searchgrid size B(n)=n^{α}. It is worthwhile noting thatalpha
andC
are defined to obtain an heuristic approximation in a reasonable amount of time. In case of small sample size (n) it is preferable to increasealpha
to 1 to obtain a solution closer to the theoretical one.
C
determines the number of starting point of the XbyY searchgrid. When trying to partition the xaxis into X columns, the algorithm will start with at most C x X clumps. Default value is 15.
The Maximal Information Coefficient (MIC) is defined asMIC(D)=max_{XY<B(n)} M(D)_{X,Y}=max_{XY<B(n)} I*(D,X,Y)/log(min(X,Y)),
where B(n)=n^{α} is the searchgrid size, I*(D,X,Y) is the maximum mutual information over all grids XbyY, of the distribution induced by D on a grid having X and Y bins (where the probability mass on a cell of the grid is the fraction of points of D falling in that cell). The other statistics of the MINE family are derived from the mutual information matrix achieved by an XbyY grid on D. The Maximum Asymmetry Score (MAS) is defined as
MAS(D) = max_{XY<B(n)} M(D)_{X,Y}  M(D)_{Y,X}.
The Maximum Edge Value (MEV) is defined as
MEV(D) = max_{XY<B(n)} {M(D)_{X,Y}: X=2 or Y=2}.
The Minimum Cell Number (MCN) is defined as
MCN(D,ε) = min_{XY<B(n)} {log(XY): M(D)_{X,Y} >= (1ε)MIC(D)}.
More details are provided in the supplementary material (SOM) of the original paper.
MINEFDR
This calls an FDR corrected version of the standard MINE method. See the description for the
MINE
method. ParameterFDR=1e3
(default) can be set.bicorFDR
This calls an FDR corrected version of the
bicor
method. See the description for thebicor
. ParameterFDR=1e3
(default) can be set.WGCNAFDR
This calls an FDR corrected version of the
WGCNA
method. ParameterP
cannot be set for this method. ParameterFDR=1e3
(default) can be set.DTWMIC
This method uses Dynamic Time Warping transformation coupled witht the MIC statistic from the MINE family. See Details for further information. Additional parameters can be set with this method:
 ...

tol
1e5 (default), a numeric value which controls the tolerance on the variable variance. In particular this parameter is passed to a function which controls the variance of each feature. The function returns the indexes of the features with variance <
tol
. Indexes refers to 1based column numbers of the original dataset.var.thr
1e5 (default), a numeric value which controls the tolerance parameter on the column variance for the method
MINE, MINEFDR, DTWMIC
.
Value
A P by P symmetric adjacency matrix with the diagonal set to 0. Self loop and direction of the edges are not taking into account. The values range in [0, 1].
Author(s)
Michele Filosi
Special thanks to:
Samantha Riccadonna, Giuseppe Jurman, Davide Albanese and Cesare
Furlanello
References
P. Langfelder, S. Horvath (2008) WGCNA: an R package for
weighted correlation network analysis. BMC Bioinformatics 2008,
9:559
P. E. Meyer, F. Lafitte, G. Bontempi (2008). MINET: An open source R/Bioconductor Package for Mutual Information based Network Inference. BMC Bioinformatics
http://www.biomedcentral.com/14712105/9/461
Jeremiah J Faith, Boris Hayete, Joshua T Thaden, Ilaria Mogno, Jamey Wierzbowski, Guillaume Cottarel, Simon Kasif, James J Collins, Timothy S Gardner. LargeScale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles
D. Albanese, M.Filosi, R. Visintainer, S. Riccadonna, G. Jurman, C. Furlanello (2013). minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers, Bioinformatics
M. Filosi, R. Visintainer, S. Riccadonna, G. Jurman, C. Furlanello (2014)Stability Indicators in Network Reconstruction, PLOSONE
D. Reshef, Y. Reshef, H. Finucane, S. Grossman, G. McVean, P.
Turnbaugh, E. Lander, M. Mitzenmacher, P. Sabeti. (2011)
Detecting novel associations in large datasets Science
(SOM: Supplementary Online Material at http://www.sciencemag.org/content/suppl/2011/12/14/334.6062.1518.DC1)
See Also
WGCNA
, minerva
, minet
, cor
Examples
1 2 3 4 5 6 7 