dhsic: d-variable Hilbert Schmidt independence criterion - dHSIC

View source: R/dhsic.R

dhsicR Documentation

d-variable Hilbert Schmidt independence criterion - dHSIC

Description

d-variable Hilbert Schmidt independence criterion - dHSIC

Usage

dhsic(X, Y, K, kernel = "gaussian", bandwidth = 1,
  matrix.input = FALSE)

Arguments

X

either a list of at least two numeric matrices or a single numeric matrix. The rows of a matrix correspond to the observations of a variable. It is always required that there are an equal number of observations for all variables (i.e. all matrices have to have the same number of rows). If X is a single numeric matrix than one has to specify the second variable as Y or set matrix.input to "TRUE". See below for more details.

Y

a numeric matrix if X is also a numeric matrix and omitted if X is a list.

K

a list of the gram matrices corresponding to each variable. If K specified the other inputs will have no effect on the computations.

kernel

a vector of character strings specifying the kernels for each variable. There exist two pre-defined kernels: "gaussian" (Gaussian kernel with median heuristic as bandwidth) and "discrete" (discrete kernel). User defined kernels can also be used by passing the function name as a string, which will then be matched using match.fun. If the length of kernel is smaller than the number of variables the kernel specified in kernel[1] will be used for all variables.

bandwidth

a numeric value specifying the size of the bandwidth used for the Gaussian kernel. Only used if kernel="gaussian.fixed".

matrix.input

a boolean. If matrix.input is "TRUE" the input X is assumed to be a matrix in which the columns correspond to the variables.

Details

The d-variable Hilbert Schmidt independence criterion (dHSIC) is a non-parametric measure of dependence between an arbitrary number of variables. In the large sample limit the value of dHSIC is 0 if thevariables are jointly independent and positive if there is adependence. It is therefore able to detect any type of dependence given a sufficient amount of data.

Value

A list containing the following components:

dHSIC

the value of the empirical estimator of dHSIC

time

numeric vector containing computation times. time[1] is time to compute Gram matrix and time[2] is time to compute dHSIC.

bandwidth

bandwidth used during computations. Only relevant if Gaussian kernel was used.

Author(s)

Niklas Pfister and Jonas Peters

References

Gretton, A., K. Fukumizu, C. H. Teo, L. Song, B. Schölkopf and A. J. Smola (2007). A kernel statistical test of independence. In Advances in Neural Information Processing Systems (pp. 585-592).

Pfister, N., P. Bühlmann, B. Schölkopf and J. Peters (2018). Kernel-based Tests for Joint Independence. Journal of the Royal Statistical Society, Series B.

See Also

In order to perform hypothesis tests based on dHSIC use the function dhsic.test.

Examples


### Three different input methods
set.seed(0)
x <- matrix(rnorm(200),ncol=2)
y <- matrix(rbinom(100,30,0.1),ncol=1)
# compute dHSIC of x and y (x is taken as a single variable)
dhsic(list(x,y),kernel=c("gaussian","discrete"))$dHSIC
dhsic(x,y,kernel=c("gaussian","discrete"))$dHSIC
# compute dHSIC of x[,1], x[,2] and y
dhsic(cbind(x,y),kernel=c("gaussian","discrete"), matrix.input=TRUE)$dHSIC

### Using a user-defined kernel (here: sigmoid kernel)
set.seed(0)
x <- matrix(rnorm(500),ncol=1)
y <- x^2+0.02*matrix(rnorm(500),ncol=1)
sigmoid <- function(x_1,x_2){
  return(tanh(sum(x_1*x_2)))
}
dhsic(x,y,kernel="sigmoid")$dHSIC

dHSIC documentation built on May 3, 2026, 1:07 a.m.