ma: Measure association
In matie: Measuring Association and Testing Independence Efficiently

Description Usage Arguments Details Value Note Author(s) References See Also Examples

A non-parametric measure of association between variables. The association score A ranges from 0 (when the variables are independent) to 1 (when they are perfectly associated). A is a kind of R^2 estimate, and can be thought of as the proportion of variance in one variable explained by another (or explained by a number of other variables - A works for multivariate associations as well).

1	ma(d,partition,ht,hp,hs,ufp)

`d`	the `n x m` data frame containing `n` observations of `m` variables for which the maximal joint/marginal likelihood ratio score is required.
`partition`	a list of column indices specifying variable groupings. Defaults to `list(c(m),c(1:m-1))` where `m = ncol(d)` which indicates explaining the last variable by means of all the other variables in the data set.
`ht`	tangent for the hyperbolic correction, default `ht = 43.6978644`.
`hp`	power for the hyperbolic correction, default `hp = 0.8120818`.
`hs`	scale for the hyperbolic correction, default `hs = 6.0049711`.
`ufp`	for debugging purposes, default `FALSE`.

An estimate of association (possibly nonlinear) is computed using a ratio of maximum likelihoods for the marginal distribution and maximum weighted likelihoods for the joint distribution.

Before the computation is carried out the data is ranked using the rwt function from the matie package. This estimate is usually conservative (ie low) and a small-samples hyperbolic correction is applied by adding an offset, os, to the joint likelihood given by:

os = ( 1 - 1 / (1 + A * ht) ) * ( n ^ (hp) / hs )

before the likelihood ratio is re-computed.

As the dimension of the data set increases so does the under-estimation of A even with the hyperbolic correction.

Returns a list of values ...

`A`	a score (including hyperbolic correction) estimating association for the data
`rawA`	the association score before hyperbolic correction
`jointKW`	the optimal kernel width for the joint distribution
`altLL`	the optimal weighted log likelihood for the alternate distribution
`nullLL`	the optimal log likelihood for the marginal distribution
`marginalKW`	the optimal kernel width for the marginal distribution
`weight`	the optimal weight used for the mixture
`LRstat`	the `LR` statistic, required for computing `p` values.
`nRows`	n, the number of complete samples in the data set
`mCols`	m, the number of variables in the data set
`partition`	user supplied partition for the variables in the data set
`ufp`	user supplied debugging flag

The data set can be of any dimension.

Ben Murrell, Dan Murrell & Hugh Murrell.

Discovering general multidimensional associations, http://arxiv.org/abs/1303.1828

rwt pd sbd shpd std

    # bivariate association
    d <- shpd(n=1000,m=2,Rsq=0.9)
    ma(d)$A
    #
    # multivariate association (the proportion of variance in "Salary"
    # explained by "Hits" and "Years")
    data(baseballData)
    ma(baseballData,partition=list(11,c(2,7)))$A