# ma: Measure association In matie: Measuring Association and Testing Independence Efficiently

## Description

A non-parametric measure of association between variables. The association score A ranges from 0 (when the variables are independent) to 1 (when they are perfectly associated). A is a kind of R^2 estimate, and can be thought of as the proportion of variance in one variable explained by another (or explained by a number of other variables - A works for multivariate associations as well).

## Usage

 `1` ```ma(d,partition,ht,hp,hs,ufp) ```

## Arguments

 `d` the `n x m` data frame containing `n` observations of `m` variables for which the maximal joint/marginal likelihood ratio score is required. `partition` a list of column indices specifying variable groupings. Defaults to `list(c(m),c(1:m-1))` where `m = ncol(d)` which indicates explaining the last variable by means of all the other variables in the data set. `ht` tangent for the hyperbolic correction, default `ht = 43.6978644`. `hp` power for the hyperbolic correction, default `hp = 0.8120818`. `hs` scale for the hyperbolic correction, default `hs = 6.0049711`. `ufp` for debugging purposes, default `FALSE`.

## Details

An estimate of association (possibly nonlinear) is computed using a ratio of maximum likelihoods for the marginal distribution and maximum weighted likelihoods for the joint distribution.

Before the computation is carried out the data is ranked using the `rwt` function from the `matie` package. This estimate is usually conservative (ie low) and a small-samples hyperbolic correction is applied by adding an offset, `os`, to the joint likelihood given by:

`os = ( 1 - 1 / (1 + A * ht) ) * ( n ^ (hp) / hs ) `

before the likelihood ratio is re-computed.

As the dimension of the data set increases so does the under-estimation of A even with the hyperbolic correction.

## Value

Returns a list of values ...

 `A ` a score (including hyperbolic correction) estimating association for the data `rawA` the association score before hyperbolic correction `jointKW ` the optimal kernel width for the joint distribution `altLL ` the optimal weighted log likelihood for the alternate distribution `nullLL ` the optimal log likelihood for the marginal distribution `marginalKW ` the optimal kernel width for the marginal distribution `weight` the optimal weight used for the mixture `LRstat` the `LR` statistic, required for computing `p` values. `nRows` n, the number of complete samples in the data set `mCols` m, the number of variables in the data set `partition` user supplied partition for the variables in the data set `ufp` user supplied debugging flag

## Note

The data set can be of any dimension.

## Author(s)

Ben Murrell, Dan Murrell & Hugh Murrell.

## References

Discovering general multidimensional associations, http://arxiv.org/abs/1303.1828

`rwt` `pd` `sbd` `shpd` `std`
 ```1 2 3 4 5 6 7 8``` ``` # bivariate association d <- shpd(n=1000,m=2,Rsq=0.9) ma(d)\$A # # multivariate association (the proportion of variance in "Salary" # explained by "Hits" and "Years") data(baseballData) ma(baseballData,partition=list(11,c(2,7)))\$A ```