Description Usage Arguments Details Value References See Also Examples

Computes Kernel distance covariance statistics, in which Xs are quantitative, Y are categorical, sigma is kernel standard deviation and returns the measures of dependence.

1 |

`x` |
data |

`y` |
label of data or univariate response variable |

`sigma` |
kernel standard deviation |

`KdCov`

compute distance correlation statistics.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
`x`

, `y`

are treated as data and labels.

Distance covariance was introduced in (Szekely07) as a dependence measure between random variables *X \in {R}^p* and *Y \in {R}^q*. If *X* and *Y* are embedded into RKHS's induced by *κ_X* and *κ_Y*, respectively, the generalized distance covariance of *X* and *Y* is (Sejdinovic13):

*\begin{array}{c}
\mathrm{dCov}_{κ_X,κ_Y}(X,Y) = {E}d_{κ_X}(X,X^{\prime})d_{κ_Y}(Y,Y^{\prime}) + {E}d_{κ_X}(X,X^{\prime}){E}d_{κ_Y}(Y,Y^{\prime}) \\
- 2{E}≤ft[{E}_{X^{\prime}}d_{κ_X}(X,X^{\prime}) {E}_{Y^{\prime}}d_{κ_Y}(Y,Y^{\prime})\right].\label{dCovkk}
\end{array}
*

In the case of *Y* being categorical, one may embed it using a set difference kernel *κ_Y*,

*\label{setdiff}
κ_Y(y,y^{\prime}) = ≤ft\{ \begin{array}{cc}
\frac{1}{2} & if \;y = y^{\prime},\\ 0 & otherwise.
\end{array} \right.
*

This is equivalent to embedding *Y* as a simplex with edges of unit length (Lyons13), i.e., *L_k* is represented by a *K* dimensional vector of all zeros except its *k*-th dimension, which has the value *\frac{√{2}}{2}*.
The distance induced by *κ_Y* is called the set distance, i.e., *d_{κ_Y}(y,y^{\prime})=0* if *y=y^{\prime}* and *1* otherwise. Using the set distance, we have the following results on the generalized distance covariance between a numerical
and a categorical random variable.

*\mathrm{dCov}_{κ_X,κ_Y}(X,Y) := \mathrm{dCov}_{κ_X}(X,Y) \nonumber = ∑_{k=1}^{K} p_k^2 ≤ft[2 {E}d_{κ_X}(X_k,X) - {E}d_{κ_X}(X_k,{X_k}^{\prime}) - {E}d_{κ_X}(X,X^{\prime}) \right].\label{dCovk}*

`KdCov`

returns the sample kernel distance correlation

Sejdinovic, D., Sriperumbudur, B., Gretton, A. and Fukumizu, K. (2013). Equivalence of Distance-based and RKHS-based Statistics in Hypothesis Testing, The Annals of Statistics, 41 (5), 2263-2291.

Zhang, S., Dang, X., Nguyen, D. and Chen, Y. (2019). Estimating feature - label dependence using Gini distance statistics. *IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted)*.

1 2 3 |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.