ma: Measure association

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

A non-parametric measure of association between variables. The association score A ranges from 0 (when the variables are independent) to 1 (when they are perfectly associated). A is a kind of R^2 estimate, and can be thought of as the proportion of variance in one variable explained by another (or explained by a number of other variables - A works for multivariate associations as well).

Usage

1
ma(d,partition,ht,hp,hs,ufp)

Arguments

d

the n x m data frame containing n observations of m variables for which the maximal joint/marginal likelihood ratio score is required.

partition

a list of column indices specifying variable groupings.

Defaults to list(c(m),c(1:m-1)) where m = ncol(d) which indicates explaining the last variable by means of all the other variables in the data set.

ht

tangent for the hyperbolic correction, default ht = 43.6978644.

hp

power for the hyperbolic correction, default hp = 0.8120818.

hs

scale for the hyperbolic correction, default hs = 6.0049711.

ufp

for debugging purposes, default FALSE.

Details

An estimate of association (possibly nonlinear) is computed using a ratio of maximum likelihoods for the marginal distribution and maximum weighted likelihoods for the joint distribution.

Before the computation is carried out the data is ranked using the rwt function from the matie package. This estimate is usually conservative (ie low) and a small-samples hyperbolic correction is applied by adding an offset, os, to the joint likelihood given by:

os = ( 1 - 1 / (1 + A * ht) ) * ( n ^ (hp) / hs )

before the likelihood ratio is re-computed.

As the dimension of the data set increases so does the under-estimation of A even with the hyperbolic correction.

Value

Returns a list of values ...

A

a score (including hyperbolic correction) estimating association for the data

rawA

the association score before hyperbolic correction

jointKW

the optimal kernel width for the joint distribution

altLL

the optimal weighted log likelihood for the alternate distribution

nullLL

the optimal log likelihood for the marginal distribution

marginalKW

the optimal kernel width for the marginal distribution

weight

the optimal weight used for the mixture

LRstat

the LR statistic, required for computing p values.

nRows

n, the number of complete samples in the data set

mCols

m, the number of variables in the data set

partition

user supplied partition for the variables in the data set

ufp

user supplied debugging flag

Note

The data set can be of any dimension.

Author(s)

Ben Murrell, Dan Murrell & Hugh Murrell.

References

Discovering general multidimensional associations, http://arxiv.org/abs/1303.1828

See Also

rwt pd sbd shpd std

Examples

1
2
3
4
5
6
7
8
    # bivariate association
    d <- shpd(n=1000,m=2,Rsq=0.9)
    ma(d)$A
    #
    # multivariate association (the proportion of variance in "Salary"
    # explained by "Hits" and "Years")
    data(baseballData)
    ma(baseballData,partition=list(11,c(2,7)))$A

matie documentation built on May 2, 2019, 3:52 a.m.