# dCor: Distance Covariance and Correlation Statistics In GiniDistance: A New Gini Correlation Between Quantitative and Qualitative Variables

## Description

Computes distance covariance and correlation statistics, in which Xs are quantitative and Ys are categorical and return the measures of dependence.

## Usage

 1  dCor(x, y, alpha) 

## Arguments

 x data y label of data or univariate response variable alpha exponent on Euclidean distance, in (0,2]

## Details

The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments x, y are treated as data and labels.

dCor calls dcor function from energy package which computes the distance correlation between X and Y where both are numerical variables. If Y is categorical, the set difference metric on the support of Y is used. That is, d(y, y^\prime) =|y-y^\prime|:= I(y\neq y^\prime), where I (\cdot) is the indicator function. Then the sample distance correlation between data and labels is computed as follows.

Let A=(a_{ij}) be a symmetric, n \times n, centered distance matrix of sample \mathbf x_1,\cdots, \mathbf x_n. The (i,j)-th entry of A is a_{ij}-\frac{1}{n-2}a_{i\cdot}-\frac{1}{n-2}a_{\cdot j} + \frac{1}{(n-1)(n-2)}a_{\cdot \cdot} if i \neq j and 0 if i=j, where a_{ij} = \|\mathbf x_i-\mathbf x_j\|^{α}, a_{i\cdot} = ∑_{j=1}^n a_{ij}, a_{\cdot j} = ∑_{i=1}^n a_{ij}, and a_{\cdot \cdot}=∑_{i,j=1}^n a_{ij}. Similarly, using the set difference metric, a symmetric, n \times n, centered distance matrix is calculated for samples y_1,\cdots, y_n and denoted by B = (b_{ij}). Unbiased estimators of \mbox{dCov}(\mathbf X,Y;α), \mbox{dCov}(\mathbf X, \mathbf X;α) and \mbox{dCov}(\mathbf Y, \mathbf Y;α) are given respectively as, \frac{1}{n(n-3)}∑_{i\ne j}A_{ij}B_{ij}, \frac{1}{n(n-3)}∑_{i\ne j}A_{ij}^2 and \frac{1}{n(n-3)}∑_{i\ne j}B_{ij}^2. Then the distance correlation is

{dCor}(\mathbf{X}, Y; α) = \frac{\mbox{ dCov}(\mathbf{X}, Y, α)}{ √{\mbox{ dCov}(\mathbf{X},\mathbf{X};α)} √{\mbox{ dCov}(Y,Y)}}.

## Value

dCor returns the sample distance variance of x, distance variance of y, distance covariance of x and y and distance correlation of x, y.

## References

Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41 (5), 3284-3305.

Szekely, G. J., Rizzo, M. L. and Bakirov, N. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35 (6), 2769-2794.

Rizzo, M.L. and Szekely, G.J., (2017). Energy: E-Statistics: Multivariate Inference via the Energy of Data (R Package), Version 1.7-0.

## See Also

dCov KdCov KdCor

## Examples

 1 2 3  x <- iris[,1:4] y <- unclass(iris[,5]) dCor(x, y, alpha = 1) 

### Example output

 0.87862


GiniDistance documentation built on June 28, 2019, 5:03 p.m.