From Data Matrix to Adjacency Matrix

Share:

Description

mat2adj is a high level function providing different network inference methods. The function takes in input a data matrix N by P, with N samples on the rows and P variables on the columns. The adjacency matrix P by P will be computed with the specified method, using N samples to infer the interactions between the variables.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
mat2adj(x,...)

## Default S3 method:
mat2adj(x, ...)


## S3 method for class 'data.frame'
mat2adj(x, ...)


## S3 method for class 'matrix'
mat2adj(x,method='cor', FDR=1e-3, P=6, measure=NULL,
alpha=0.6, C=15,DP=1, ...)

Arguments

x

a matrix or data.frame of numerical values of N rows and P columns

method

a character string indicating which method will be used for inferring a relationship between two variables. This must be (an abbreviation of) one of "cor" (default), "WGCNA", "WGCNAFDR", "bicor", "bicorFDR", "TOM", "ARACNE", "CLR", "MINE", "MINEFDR", "DTWMIC"

P

6 (default), integer used as soft-thresholding power for network construction, used by the "WGCNA" and "TOM" methods.

FDR

1e-3 (default), a number which indicates the number of values generated to compute the NULL hypothesis. To be used for methods "WGCNAFDR", "MINEFDR" and "bicorFDR"

measure

"MIC" (default), a valid string indicating the measure of the MINE suite to compute. One of "MIC", "MCN", "MEV", "MAS" or "MICR2".

alpha

0.6 (default), the alpha argument to be passed to the function mine. See also mine

C

15 (default), an integer value to be passed at the mine function main. Only for methods "MINE" and "MINEFDR".

DP

1 (default), only for method "DTWMIC".

...

Additional arguments to be passed to the downstream functions. Normally the argument passed through ... are processed by the functions which compute the inference. Not all parameters are used by all functions.

Details

mat2adj function is a high-level function which includes different methods for network inference. In particular the function infer the relation between all the possible pairwaise comparison between columns in the dataset. If the input is a data.frame, columns were first converted into a numerical matrix. Given a N by P numerical matrix, the relation between each PxP pairs of variables is inferred with the selected method.

The "FDR" corrected methods are based on a permutation estimate of the null hypothesis. A total amount of 1/("FDR") permutations are performed to asses the reliability of the inferred link; each link is set only if it is inferred in all the permutations and its weight is lower then the value on non permuted data. The default value for FDR is 1e-3.

All the available methods are the following:

cor

(default) computes the interaction using the 'Pearson' correlation coefficient. Different correlation methods, such as Spearman could be passed to the function using ....

ARACNE

Algorithm for the Reconstruction of Gene Regulatory Networks, see also package minet

CLR

Context Likelihood of Relatedness see also package minet

WGCNA

WeiGhted Correlation Network Analsysis. It is based on a correlation measure. For further details see the documentation of WGCNA package. The method accept parameter P which is set to 6 by default

bicor

Biweighted Correlation method. It uses a biweighted correlation as described in bicor package

TOM

Topological Overlap Measure inference method. For further details see the documentation of WGCNA package. As for WGCNA the parameter P can be set(6 by default).

MINE

Maximum Information-based Non-parametric Exploration. This method uses the minerva implementation of the original measure. For this methods different measures are available. See minerva for further information. To clarify the main MINE family statistics let D={(x,y)} be the set of n ordered pairs of elements of x and y. The data space is partitioned in an X-by-Y grid, grouping the x and y values in X and Y bins respectively.
The value of alpha (default 0.6) has been empirically chosen by the authors of the original paper.alpha is the exponent of the search-grid size B(n)=n^{α}. It is worthwhile noting that alpha and C are defined to obtain an heuristic approximation in a reasonable amount of time. In case of small sample size (n) it is preferable to increase alpha to 1 to obtain a solution closer to the theoretical one.
C determines the number of starting point of the X-by-Y search-grid. When trying to partition the x-axis into X columns, the algorithm will start with at most C x X clumps. Default value is 15.
The Maximal Information Coefficient (MIC) is defined as

MIC(D)=max_{XY<B(n)} M(D)_{X,Y}=max_{XY<B(n)} I*(D,X,Y)/log(min(X,Y)),

where B(n)=n^{α} is the search-grid size, I*(D,X,Y) is the maximum mutual information over all grids X-by-Y, of the distribution induced by D on a grid having X and Y bins (where the probability mass on a cell of the grid is the fraction of points of D falling in that cell). The other statistics of the MINE family are derived from the mutual information matrix achieved by an X-by-Y grid on D. The Maximum Asymmetry Score (MAS) is defined as

MAS(D) = max_{XY<B(n)} |M(D)_{X,Y} - M(D)_{Y,X}|.

The Maximum Edge Value (MEV) is defined as

MEV(D) = max_{XY<B(n)} {M(D)_{X,Y}: X=2 or Y=2}.

The Minimum Cell Number (MCN) is defined as

MCN(D,ε) = min_{XY<B(n)} {log(XY): M(D)_{X,Y} >= (1-ε)MIC(D)}.

More details are provided in the supplementary material (SOM) of the original paper.

MINEFDR

This calls an FDR corrected version of the standard MINE method. See the description for the MINE method. Parameter FDR=1e-3 (default) can be set.

bicorFDR

This calls an FDR corrected version of the bicor method. See the description for the bicor. Parameter FDR=1e-3 (default) can be set.

WGCNAFDR

This calls an FDR corrected version of the WGCNA method. Parameter P cannot be set for this method. Parameter FDR=1e-3 (default) can be set.

DTWMIC

This method uses Dynamic Time Warping transformation coupled witht the MIC statistic from the MINE family. See Details for further information. Additional parameters can be set with this method:

...
tol

1e-5 (default), a numeric value which controls the tolerance on the variable variance. In particular this parameter is passed to a function which controls the variance of each feature. The function returns the indexes of the features with variance <tol. Indexes refers to 1-based column numbers of the original dataset.

var.thr

1e-5 (default), a numeric value which controls the tolerance parameter on the column variance for the method MINE, MINEFDR, DTWMIC.

Value

A P by P symmetric adjacency matrix with the diagonal set to 0. Self loop and direction of the edges are not taking into account. The values range in [0, 1].

Author(s)

Michele Filosi
Special thanks to: Samantha Riccadonna, Giuseppe Jurman, Davide Albanese and Cesare Furlanello

References

P. Langfelder, S. Horvath (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008, 9:559

P. E. Meyer, F. Lafitte, G. Bontempi (2008). MINET: An open source R/Bioconductor Package for Mutual Information based Network Inference. BMC Bioinformatics

http://www.biomedcentral.com/1471-2105/9/461

Jeremiah J Faith, Boris Hayete, Joshua T Thaden, Ilaria Mogno, Jamey Wierzbowski, Guillaume Cottarel, Simon Kasif, James J Collins, Timothy S Gardner. Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles

D. Albanese, M.Filosi, R. Visintainer, S. Riccadonna, G. Jurman, C. Furlanello (2013). minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers, Bioinformatics

http://mpba.fbk.eu/cmine

M. Filosi, R. Visintainer, S. Riccadonna, G. Jurman, C. Furlanello (2014)Stability Indicators in Network Reconstruction, PLOSONE

D. Reshef, Y. Reshef, H. Finucane, S. Grossman, G. McVean, P. Turnbaugh, E. Lander, M. Mitzenmacher, P. Sabeti. (2011) Detecting novel associations in large datasets Science

http://www.exploredata.net

(SOM: Supplementary Online Material at http://www.sciencemag.org/content/suppl/2011/12/14/334.6062.1518.DC1)

See Also

WGCNA, minerva, minet, cor

Examples

1
2
3
4
5
6
7
## Not run: 
data(Spellman, package="minerva")
dim(Spellman)
A <- mat2adj(Spellman,method="cor", n.cores=1)
dim(A)

## End(Not run)