gCov | R Documentation |
Computes Gini distance covariance statistics, in which Xs are quantitative, Y are categorical, alpha is an exponent on Euclidean distance and returns the measures of dependence.
gCov(x, y, alpha)
x |
data |
y |
label of data or univariate response variable |
alpha |
exponent on Euclidean distance, in (0,2] |
gCov
compute Gini distance covariance statistics.
It is a self-contained R function returning a measure of dependence statistics.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels. alpha
if missing by default is 1, otherwise it is exponent on the Euclidean distance.
Gini distance covariance is a new measure of dependence between random vectors and its labels. For all distributions with finite first moments, Gini distance correlation gCov has the following fundamental properties:
(1) gCov(X,Y) is defined for X in arbitrary dimension quantitive variable and Y a univariate categorical variable.
(2) gCov(X,Y)=0 characterizes independence of X and Y.
Gini distance covariance satisfies 0 ≤ gCov(X,Y), and gCov = 0 only if X and Y are independent. Gini distance covariance gCov provides a new approach to the problem of testing the joint independence of random vectors. The formal definitions of the population coefficients gCov is given in (DNCZ 2018). The empirical Gini distance covariance gCov_n(X,Y; alpha) is the nonnegative number computed as follows.
Suppose a sample data {\mathcal{D}} =\{(\mathbf{x}_i,y_i)\} for i = 1,...,n available. The sample counterparts can be easily computed. Let {\mathcal{I}}_k be the index set of sample points with y_i =L_k, then p_k is estimated by the sample proportion of that category, that is, \hat{p}_k= \frac{n_k}{n} where n_k is the number of elements in {\mathcal{I}}_k. With a given α \in (0,2), a point estimator of ρ_g(α) is given as follows.
\hat{Δ}_k(α)= {n_k \choose 2}^{-1} ∑_{i<j \in {\mathcal{I}}_k} \|\mathbf{x}_i -\mathbf{x}_j\| ^{α},
\hat{Δ}(α)={n \choose 2}^{-1} ∑_{1=i<j=n} \|\mathbf{x}_i -\mathbf{x}_j\| ^{α},
{gCov}= \hat{Δ}(α)-∑_{k=1}^K \hat p_k \hat{Δ}_k(α).
gCov
returns the sample Gini distance covariance
Dang, X., Nguyen, D., Chen, Y. and Zhang, J., (2019). A new Gini correlation between quantitative and qualitative variables, Journal of the American Statistical Association (submitted), https://arxiv.org/pdf/1809.09793.pdf
gCor
gmd
KgCov
KgCor
x <- iris[,1:4] y <- unclass(iris[,5]) gCov(x, y, alpha = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.