Description Usage Arguments Details Value Note Author(s) References See Also Examples
A non-parametric measure of association between variables. The association score A ranges from 0 (when the variables are independent) to 1 (when they are perfectly associated). A is a kind of R^2 estimate, and can be thought of as the proportion of variance in one variable explained by another (or explained by a number of other variables - A works for multivariate associations as well).
1 | ma(d,partition,ht,hp,hs,ufp)
|
d |
the |
partition |
a list of column indices specifying variable groupings. Defaults to |
ht |
tangent for the hyperbolic correction, default |
hp |
power for the hyperbolic correction, default |
hs |
scale for the hyperbolic correction, default |
ufp |
for debugging purposes, default |
An estimate of association (possibly nonlinear) is computed using a ratio of maximum likelihoods for the marginal distribution and maximum weighted likelihoods for the joint distribution.
Before the computation is carried out the data is ranked using the
rwt
function from the matie
package.
This estimate is usually conservative (ie low) and a small-samples hyperbolic
correction is applied by adding an offset, os
,
to the joint likelihood given by:
os = ( 1 - 1 / (1 + A * ht) ) * ( n ^ (hp) / hs )
before the likelihood ratio is re-computed.
As the dimension of the data set increases so does the under-estimation of A even with the hyperbolic correction.
Returns a list of values ...
A |
a score (including hyperbolic correction) estimating association for the data |
rawA |
the association score before hyperbolic correction |
jointKW |
the optimal kernel width for the joint distribution |
altLL |
the optimal weighted log likelihood for the alternate distribution |
nullLL |
the optimal log likelihood for the marginal distribution |
marginalKW |
the optimal kernel width for the marginal distribution |
weight |
the optimal weight used for the mixture |
LRstat |
the |
nRows |
n, the number of complete samples in the data set |
mCols |
m, the number of variables in the data set |
partition |
user supplied partition for the variables in the data set |
ufp |
user supplied debugging flag |
The data set can be of any dimension.
Ben Murrell, Dan Murrell & Hugh Murrell.
Discovering general multidimensional associations, http://arxiv.org/abs/1303.1828
1 2 3 4 5 6 7 8 | # bivariate association
d <- shpd(n=1000,m=2,Rsq=0.9)
ma(d)$A
#
# multivariate association (the proportion of variance in "Salary"
# explained by "Hits" and "Years")
data(baseballData)
ma(baseballData,partition=list(11,c(2,7)))$A
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.