dCor | R Documentation |
Computes distance covariance and correlation statistics, in which Xs are quantitative and Ys are categorical and return the measures of dependence.
dCor(x, y, alpha)
x |
data |
y |
label of data or univariate response variable |
alpha |
exponent on Euclidean distance, in (0,2] |
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels.
dCor
calls dcor
function from energy package which computes the distance correlation between X and Y where both are numerical variables. If Y is categorical, the set difference metric on the support of Y is used. That is, d(y, y^\prime) =|y-y^\prime|:= I(y\neq y^\prime),
where I (\cdot) is the indicator function. Then the sample distance correlation between data and labels is computed as follows.
Let A=(a_{ij}) be a symmetric, n \times n, centered distance matrix of sample \mathbf x_1,\cdots, \mathbf x_n. The (i,j)-th entry of A is a_{ij}-\frac{1}{n-2}a_{i\cdot}-\frac{1}{n-2}a_{\cdot j} + \frac{1}{(n-1)(n-2)}a_{\cdot \cdot} if i \neq j and 0 if i=j, where a_{ij} = \|\mathbf x_i-\mathbf x_j\|^{α}, a_{i\cdot} = ∑_{j=1}^n a_{ij}, a_{\cdot j} = ∑_{i=1}^n a_{ij}, and a_{\cdot \cdot}=∑_{i,j=1}^n a_{ij}. Similarly, using the set difference metric, a symmetric, n \times n, centered distance matrix is calculated for samples y_1,\cdots, y_n and denoted by B = (b_{ij}). Unbiased estimators of \mbox{dCov}(\mathbf X,Y;α), \mbox{dCov}(\mathbf X, \mathbf X;α) and \mbox{dCov}(\mathbf Y, \mathbf Y;α) are given respectively as, \frac{1}{n(n-3)}∑_{i\ne j}A_{ij}B_{ij}, \frac{1}{n(n-3)}∑_{i\ne j}A_{ij}^2 and \frac{1}{n(n-3)}∑_{i\ne j}B_{ij}^2. Then the distance correlation is
{dCor}(\mathbf{X}, Y; α) = \frac{\mbox{ dCov}(\mathbf{X}, Y, α)}{ √{\mbox{ dCov}(\mathbf{X},\mathbf{X};α)} √{\mbox{ dCov}(Y,Y)}}.
dCor
returns the sample distance variance of x
, distance variance of y
, distance covariance of x
and y
and distance correlation of x
, y
.
Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41 (5), 3284-3305.
Szekely, G. J., Rizzo, M. L. and Bakirov, N. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35 (6), 2769-2794.
Rizzo, M.L. and Szekely, G.J., (2017). Energy: E-Statistics: Multivariate Inference via the Energy of Data (R Package), Version 1.7-0.
dCov
KdCov
KdCor
x <- iris[,1:4] y <- unclass(iris[,5]) dCor(x, y, alpha = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.