# mine: MINE family statistics Maximal Information-Based... In minerva: Maximal Information-Based Nonparametric Exploration for Variable Analysis

## Description

MINE family statistics Maximal Information-Based Nonparametric Exploration (MINE) statistics. `mine` computes the MINE family measures between two variables.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15``` ```mine( x, y = NULL, master = NULL, alpha = 0.6, C = 15, n.cores = 1, var.thr = 1e-05, eps = NULL, est = "mic_approx", na.rm = FALSE, use = "all.obs", normalization = FALSE, ... ) ```

## Arguments

 `x` a numeric vector (of size n), matrix or data frame (which is coerced to matrix). `y` NULL (default) or a numeric vector of size n (i.e., with compatible dimensions to x). `master` an optional vector of indices (numeric or character) to be given when `y` is not set, otherwise master is ignored. It can be either one column index to be used as reference for the comparison (versus all other columns) or a vector of column indices to be used for computing all mutual statistics. `alpha` float (0, 1.0] or >=4 if alpha is in (0,1] then B will be max(n^alpha, 4) where n is the number of samples. If alpha is >=4 then alpha defines directly the B parameter. If alpha is higher than the number of samples (n) it will be limited to be n, so B = min(alpha, n) Default value is 0.6 (see Details). `C` an optional number determining the starting point of the X-by-Y search-grid. When trying to partition the x-axis into X columns, the algorithm will start with at most `C`X clumps. Default value is 15 (see Details). `n.cores` ooptional number of cores to be used in the computations, when master is specified. It requires the parallel package, which provides support for parallel computing, released with R >= 2.14.0. Defaults is 1 (i.e., not performing parallel computing). `var.thr` minimum value allowed for the variance of the input variables, since `mine` can not be computed in case of variance close to 0. Default value is 1e-5. Information about failed check are reported in var_thr.log file. `eps` integer in [0,1]. If 'NULL' (default) it is set to 1-MIC. It can be set to zero for noiseless functions, but the default choice is the most appropriate parametrization for general cases (as stated in Reshef et al. SOM). It provides robustness. `est` Default value is "mic_approx". With est="mic_approx" the original MINE statistics will be computed, with est="mic_e" the equicharacteristic matrix is is evaluated and the mic() and tic() methods will return MIC_e and TIC_e values respectively. `na.rm` boolean. This variable is passed directly to the `cor`-based functions. See `cor` for further details. `use` Default value is "all.obs". This variable is passed directly to the `cor`-based functions. See `cor` for further details. `normalization` logical whether to use normalization when computing `tic` measure. Ignored for other measures. Default to FALSE. `...` currently ignored

## Details

`mine` is an R wrapper for the C engine cmine (http://minepy.readthedocs.io/en/latest/), an implementation of Maximal Information-Based Nonparametric Exploration (MINE) statistics. The MINE statistics were firstly detailed in D. Reshef et al. (2011) Detecting novel associations in large datasets. Science 334, 6062 (http://www.exploredata.net).

Here we recall the main concepts of the MINE family statistics. Let D={(x,y)} be the set of n ordered pairs of elements of `x` and `y`. The data space is partitioned in an X-by-Y grid, grouping the x and y values in X and Y bins respectively.

The Maximal Information Coefficient (MIC) is defined as

MIC(D)=max_{XY<B(n)} M(D)_{X,Y}=max_{XY<B(n)} I*(D,X,Y)/log(min(X,Y)),

where B(n)=n^{α} is the search-grid size, I*(D,X,Y) is the maximum mutual information over all grids X-by-Y, of the distribution induced by D on a grid having X and Y bins (where the probability mass on a cell of the grid is the fraction of points of D falling in that cell). The other statistics of the MINE family are derived from the mutual information matrix achieved by an X-by-Y grid on D.

The Maximum Asymmetry Score (MAS) is defined as

MAS(D) = max_{XY<B(n)} |M(D)_{X,Y} - M(D)_{Y,X}|.

The Maximum Edge Value (MEV) is defined as

MEV(D) = max_{XY<B(n)} {M(D)_{X,Y}: X=2 or Y=2}.

The Minimum Cell Number (MCN) is defined as

MCN(D,ε) = min_{XY<B(n)} {log(XY): M(D)_{X,Y} >= (1-ε)MIC(D)}.

More details are provided in the supplementary material (SOM) of the original paper.

The MINE statistics can be computed for two numeric vectors `x` and `y`. Otherwise a matrix (or data frame) can be provided and two options are available according to the value of `master`. If `master` is a column identifier, then the MINE statistics are computed for the master variable versus the other matrix columns. If `master` is a set of column identifiers, then all mutual MINE statistics are computed among the column subset. `master`, `alpha`, and `C` refers respectively to the style, exp, and c parameters of the original java code. In the original article, the authors state that the default value α=0.6 (which is the exponent of the search-grid size B(n)=n^{α}) has been empirically chosen. It is worthwhile noting that `alpha` and `C` are defined to obtain an heuristic approximation in a reasonable amount of time. In case of small sample size (n) it is preferable to increase `alpha` to 1 to obtain a solution closer to the theoretical one.

## Value

The Maximal Information-Based Nonparametric Exploration (MINE) statistics provide quantitative evaluations of different aspects of the relationship between two variables. In particular `mine` returns a list of 5 statistics:

 `MIC` Maximal Information Coefficient. It is related to the relationship strenght and it can be interpreted as a correlation measure. It is symmetric and it ranges in [0,1], where it tends to 0 for statistically independent data and it approaches 1 in probability for noiseless functional relationships (more details can ben found in the original paper). `MAS` Maximum Asymmetry Score. It captures the deviation from monotonicity. Note that MAS < MIC. Note: it can be useful for detecting periodic relationships (unknown frequencies). `MEV` Maximum Edge Value. It measures the closeness to being a function. Note that MEV <= MIC. `MCN` Minimum Cell Number. It is a complexity measure. `MIC-R2` It is the difference between the MIC value and the Pearson correlation coefficient.

When computing `mine` between two numeric vectors `x` and `y`, the output is a list of 5 numeric values. When `master` is provided, `mine` returns a list of 5 matrices having `ncol` equal to m. In particular, if `master` is a single value, then `mine` returns a list of 5 matrices having 1 column, whose rows correspond to the MINE measures between the master column versus all. Instead if `master` is a vector of m indices, then `mine` output is a list of 5 m-by-m matrices, whose element i,j corresponds to the MINE statistics computed between the i and j columns of `x`.

## Author(s)

Michele Filosi and Roberto Visintainer

## References

D. Reshef, Y. Reshef, H. Finucane, S. Grossman, G. McVean, P. Turnbaugh, E. Lander, M. Mitzenmacher, P. Sabeti. (2011) Detecting novel associations in large datasets. Science 334, 6062
http://www.exploredata.net
(SOM: Supplementary Online Material at https://science.sciencemag.org/content/suppl/2011/12/14/334.6062.1518.DC1)

D. Albanese, M. Filosi, R. Visintainer, S. Riccadonna, G. Jurman, C. Furlanello. minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers. Bioinformatics (2013) 29(3): 407-408, doi: 10.1093/bioinformatics/bts707.

minepy. Maximal Information-based Nonparametric Exploration in C and Python.
http://minepy.sourceforge.net

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102``` ```A <- matrix(runif(50),nrow=5) mine(x=A, master=1) mine(x=A, master=c(1,3,5,7,8:10)) x <- runif(10); y <- 3*x+2; plot(x,y,type="l") mine(x,y) # MIC = 1 # MAS = 0 # MEV = 1 # MCN = 2 # MIC-R2 = 0 set.seed(100); x <- runif(10); y <- 3*x+2+rnorm(10,mean=2,sd=5); plot(x,y) mine(x,y) # rounded values of MINE statistics # MIC = 0.61 # MAS = 0 # MEV = 0.61 # MCN = 2 # MIC-R2 = 0.13 t <-seq(-2*pi,2*pi,0.2); y1 <- sin(2*t); plot(t,y1,type="l") mine(t,y1) # rounded values of MINE statistics # MIC = 0.66 # MAS = 0.37 # MEV = 0.66 # MCN = 3.58 # MIC-R2 = 0.62 y2 <- sin(4*t); plot(t,y2,type="l") mine(t,y2) # rounded values of MINE statistics # MIC = 0.32 # MAS = 0.18 # MEV = 0.32 # MCN = 3.58 # MIC-R2 = 0.31 # Note that for small n it is better to increase alpha mine(t,y1,alpha=1) # rounded values of MINE statistics # MIC = 1 # MAS = 0.59 # MEV = 1 # MCN = 5.67 # MIC-R2 = 0.96 mine(t,y2,alpha=1) # rounded values of MINE statistics # MIC = 1 # MAS = 0.59 # MEV = 1 # MCN = 5 # MIC-R2 = 0.99 # Some examples from SOM x <- runif(n=1000, min=0, max=1) # Linear relationship y1 <- x; plot(x,y1,type="l"); mine(x,y1) # MIC = 1 # MAS = 0 # MEV = 1 # MCN = 4 # MIC-R2 = 0 # Parabolic relationship y2 <- 4*(x-0.5)^2; plot(sort(x),y2[order(x)],type="l"); mine(x,y2) # rounded values of MINE statistics # MIC = 1 # MAS = 0.68 # MEV = 1 # MCN = 5.5 # MIC-R2 = 1 # Sinusoidal relationship (varying frequency) y3 <- sin(6*pi*x*(1+x)); plot(sort(x),y3[order(x)],type="l"); mine(x,y3) # rounded values of MINE statistics # MIC = 1 # MAS = 0.85 # MEV = 1 # MCN = 4.6 # MIC-R2 = 0.96 # Circle relationship t <- seq(from=0,to=2*pi,length.out=1000) x4 <- cos(t); y4 <- sin(t); plot(x4, y4, type="l",asp=1) mine(x4,y4) # rounded values of MINE statistics # MIC = 0.68 # MAS = 0.01 # MEV = 0.32 # MCN = 5.98 # MIC-R2 = 0.68 data(Spellman) res <- mine(Spellman,master=1,n.cores=1) ## Not run: ## example of multicore computation res <- mine(Spellman,master=1,n.cores=parallel::detectCores()-1) ## End(Not run) ```

### Example output

```\$MIC
[,1]
[1,] 0.9709506
[2,] 0.3219281
[3,] 0.4199731
[4,] 0.4199731
[5,] 0.3219281
[6,] 0.9709506
[7,] 0.9709506
[8,] 0.4199731
[9,] 0.4199731
[10,] 0.9709506

\$MAS
[,1]
[1,]    0
[2,]    0
[3,]    0
[4,]    0
[5,]    0
[6,]    0
[7,]    0
[8,]    0
[9,]    0
[10,]    0

\$MEV
[,1]
[1,] 0.9709506
[2,] 0.3219281
[3,] 0.4199731
[4,] 0.4199731
[5,] 0.3219281
[6,] 0.9709506
[7,] 0.9709506
[8,] 0.4199731
[9,] 0.4199731
[10,] 0.9709506

\$MCN
[,1]
[1,]    2
[2,]    2
[3,]    2
[4,]    2
[5,]    2
[6,]    2
[7,]    2
[8,]    2
[9,]    2
[10,]    2

\$MICR2
[,1]
[1,] -0.02904941
[2,]  0.31966063
[3,]  0.09011389
[4,]  0.26547216
[5,]  0.18254976
[6,]  0.14696220
[7,]  0.65189905
[8,]  0.21821985
[9,]  0.41456301
[10,]  0.62088731

\$GMIC
[,1]
[1,] 0.9709506
[2,] 0.3219281
[3,] 0.4199731
[4,] 0.4199731
[5,] 0.3219281
[6,] 0.9709506
[7,] 0.9709506
[8,] 0.4199731
[9,] 0.4199731
[10,] 0.9709506

\$TIC
[,1]
[1,] 0.9709506
[2,] 0.3219281
[3,] 0.4199731
[4,] 0.4199731
[5,] 0.3219281
[6,] 0.9709506
[7,] 0.9709506
[8,] 0.4199731
[9,] 0.4199731
[10,] 0.9709506

\$MIC
[,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
[1,] 0.9709506 0.4199731 0.3219281 0.9709506 0.4199731 0.4199731 0.9709506
[2,] 0.4199731 0.9709506 0.4199731 0.4199731 0.9709506 0.4199731 0.3219281
[3,] 0.3219281 0.4199731 0.9709506 0.4199731 0.9709506 0.4199731 0.4199731
[4,] 0.9709506 0.4199731 0.4199731 0.9709506 0.4199731 0.4199731 0.4199731
[5,] 0.4199731 0.9709506 0.9709506 0.4199731 0.9709506 0.3219281 0.3219281
[6,] 0.4199731 0.4199731 0.4199731 0.4199731 0.3219281 0.9709506 0.4199731
[7,] 0.9709506 0.3219281 0.4199731 0.4199731 0.3219281 0.4199731 0.9709506

\$MAS
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0    0    0    0    0    0    0
[2,]    0    0    0    0    0    0    0
[3,]    0    0    0    0    0    0    0
[4,]    0    0    0    0    0    0    0
[5,]    0    0    0    0    0    0    0
[6,]    0    0    0    0    0    0    0
[7,]    0    0    0    0    0    0    0

\$MEV
[,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
[1,] 0.9709506 0.4199731 0.3219281 0.9709506 0.4199731 0.4199731 0.9709506
[2,] 0.4199731 0.9709506 0.4199731 0.4199731 0.9709506 0.4199731 0.3219281
[3,] 0.3219281 0.4199731 0.9709506 0.4199731 0.9709506 0.4199731 0.4199731
[4,] 0.9709506 0.4199731 0.4199731 0.9709506 0.4199731 0.4199731 0.4199731
[5,] 0.4199731 0.9709506 0.9709506 0.4199731 0.9709506 0.3219281 0.3219281
[6,] 0.4199731 0.4199731 0.4199731 0.4199731 0.3219281 0.9709506 0.4199731
[7,] 0.9709506 0.3219281 0.4199731 0.4199731 0.3219281 0.4199731 0.9709506

\$MCN
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    2    2    2    2    2    2    2
[2,]    2    2    2    2    2    2    2
[3,]    2    2    2    2    2    2    2
[4,]    2    2    2    2    2    2    2
[5,]    2    2    2    2    2    2    2
[6,]    2    2    2    2    2    2    2
[7,]    2    2    2    2    2    2    2

\$MICR2
[,1]        [,2]        [,3]        [,4]        [,5]        [,6]
[1,] -0.02904941  0.09011389  0.18254976  0.65189905  0.21821985  0.41456301
[2,]  0.09011389 -0.02904941 -0.09180528  0.15745852  0.13864789  0.41172109
[3,]  0.18254976 -0.09180528 -0.02904941  0.39742169  0.10641182  0.05299923
[4,]  0.65189905  0.15745852  0.39742169 -0.02904941  0.34883796  0.26908462
[5,]  0.21821985  0.13864789  0.10641182  0.34883796 -0.02904941  0.23458887
[6,]  0.41456301  0.41172109  0.05299923  0.26908462  0.23458887 -0.02904941
[7,]  0.62088731  0.03484033  0.39962969  0.24080761  0.28604760 -0.02912810
[,7]
[1,]  0.62088731
[2,]  0.03484033
[3,]  0.39962969
[4,]  0.24080761
[5,]  0.28604760
[6,] -0.02912810
[7,] -0.02904941

\$GMIC
[,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
[1,] 0.9709506 0.4199731 0.3219281 0.9709506 0.4199731 0.4199731 0.9709506
[2,] 0.4199731 0.9709506 0.4199731 0.4199731 0.9709506 0.4199731 0.3219281
[3,] 0.3219281 0.4199731 0.9709506 0.4199731 0.9709506 0.4199731 0.4199731
[4,] 0.9709506 0.4199731 0.4199731 0.9709506 0.4199731 0.4199731 0.4199731
[5,] 0.4199731 0.9709506 0.9709506 0.4199731 0.9709506 0.3219281 0.3219281
[6,] 0.4199731 0.4199731 0.4199731 0.4199731 0.3219281 0.9709506 0.4199731
[7,] 0.9709506 0.3219281 0.4199731 0.4199731 0.3219281 0.4199731 0.9709506

\$TIC
[,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
[1,] 0.9709506 0.4199731 0.3219281 0.9709506 0.4199731 0.4199731 0.9709506
[2,] 0.4199731 0.9709506 0.4199731 0.4199731 0.9709506 0.4199731 0.3219281
[3,] 0.3219281 0.4199731 0.9709506 0.4199731 0.9709506 0.4199731 0.4199731
[4,] 0.9709506 0.4199731 0.4199731 0.9709506 0.4199731 0.4199731 0.4199731
[5,] 0.4199731 0.9709506 0.9709506 0.4199731 0.9709506 0.3219281 0.3219281
[6,] 0.4199731 0.4199731 0.4199731 0.4199731 0.3219281 0.9709506 0.4199731
[7,] 0.9709506 0.3219281 0.4199731 0.4199731 0.3219281 0.4199731 0.9709506

\$MIC
[1] 1

\$MAS
[1] 0

\$MEV
[1] 1

\$MCN
[1] 2

\$`MIC-R2`
[1] 4.440892e-16

\$GMIC
[1] 1

\$TIC
[1] 1

\$MIC
[1] 0.6099865

\$MAS
[1] 0

\$MEV
[1] 0.6099865

\$MCN
[1] 2

\$`MIC-R2`
[1] 0.1292083

\$GMIC
[1] 0.6099865

\$TIC
[1] 0.6099865

\$MIC
[1] 0.6595235

\$MAS
[1] 0.3737366

\$MEV
[1] 0.6595235

\$MCN
[1] 3.321928

\$`MIC-R2`
[1] 0.6224055

\$GMIC
[1] 0.3662762

\$TIC
[1] 3.239177

\$MIC
[1] 0.3239506

\$MAS
[1] 0.1797499

\$MEV
[1] 0.3239506

\$MCN
[1] 2.584963

\$`MIC-R2`
[1] 0.3151076

\$GMIC
[1] 0.1819568

\$TIC
[1] 1.852355

\$MIC
[1] 1

\$MAS
[1] 0.5869921

\$MEV
[1] 0.9998182

\$MCN
[1] 5.672425

\$`MIC-R2`
[1] 0.962882

\$GMIC
[1] 0.8704609

\$TIC
[1] 84.0454

\$MIC
[1] 0.9998182

\$MAS
[1] 0.5914029

\$MEV
[1] 0.9998182

\$MCN
[1] 5

\$`MIC-R2`
[1] 0.9909753

\$GMIC
[1] 0.6522253

\$TIC
[1] 70.02181

\$MIC
[1] 1

\$MAS
[1] 0

\$MEV
[1] 1

\$MCN
[1] 2

\$`MIC-R2`
[1] 2.664535e-15

\$GMIC
[1] 1

\$TIC
[1] 147.9999

\$MIC
[1] 1

\$MAS
[1] 0.6460554

\$MEV
[1] 1

\$MCN
[1] 2.584963

\$`MIC-R2`
[1] 0.9967731

\$GMIC
[1] 0.9878122

\$TIC
[1] 110.6488

\$MIC
[1] 1

\$MAS
[1] 0.8636941

\$MEV
[1] 1

\$MCN
[1] 4.70044

\$`MIC-R2`
[1] 0.9582387

\$GMIC
[1] 0.8121197

\$TIC
[1] 56.29453

\$MIC
[1] 0.6829015

\$MAS
[1] 0.01067816

\$MEV
[1] 0.3219625

\$MCN
[1] 3.169925

\$`MIC-R2`
[1] 0.6829015

\$GMIC
[1] 0.03442878

\$TIC
[1] 73.03107
```

