mmd2test: Kernel Two-sample Test with Maximum Mean Discrepancy
In maotai: Tools for Matrix Algebra, Optimization and Inference

mmd2test

R Documentation

Kernel Two-sample Test with Maximum Mean Discrepancy

Description

Maximum Mean Discrepancy (MMD) as a measure of discrepancy between samples is employed as a test statistic for two-sample hypothesis test of equal distributions. Kernel matrix K is a symmetric square matrix that is positive semidefinite.

Usage

mmd2test(K, label, method = c("b", "u"), mc.iter = 999)

Arguments

`K`	kernel matrix or an object of `kernelMatrix` class from kernlab package.
`label`	label vector of class indices.
`method`	type of estimator to be used. `"b"` for biased and `"u"` for unbiased estimator of MMD.
`mc.iter`	the number of Monte Carlo resampling iterations.

Value

a (list) object of S3 class htest containing:

statistic: a test statistic.
p.value: p-value under H_0.
alternative: alternative hypothesis.
method: name of the test.
data.name: name(s) of provided kernel matrix.

References

\insertRef

gretton_kernel_2012maotai

Examples

## small test for CRAN submission
dat1 <- matrix(rnorm(60, mean= 1), ncol=2) # group 1 : 30 obs of mean  1
dat2 <- matrix(rnorm(50, mean=-1), ncol=2) # group 2 : 25 obs of mean -1

dmat <- as.matrix(dist(rbind(dat1, dat2)))  # Euclidean distance matrix
kmat <- exp(-(dmat^2))                      # build a gaussian kernel matrix
lab  <- c(rep(1,30), rep(2,25))             # corresponding label

mmd2test(kmat, lab)                         # run the code !

## Not run: 
## WARNING: computationally heavy. 
#  Let's compute empirical Type 1 error at alpha=0.05
niter = 496  
pvals1 = rep(0,niter)
pvals2 = rep(0,niter)
for (i in 1:niter){
  dat = matrix(rnorm(200),ncol=2)
  lab = c(rep(1,50), rep(2,50))
  lbd = 0.1
  kmat = exp(-lbd*(as.matrix(dist(dat))^2))
  pvals1[i] = mmd2test(kmat, lab, method="b")$p.value
  pvals2[i] = mmd2test(kmat, lab, method="u")$p.value
  print(paste("iteration ",i," complete..",sep=""))
}

#  Visualize the above at multiple significance levels
alphas = seq(from=0.001, to=0.999, length.out=100)
errors1 = rep(0,100)
errors2 = rep(0,100)
for (i in 1:100){
   errors1[i] = sum(pvals1<=alphas[i])/niter
   errors2[i] = sum(pvals2<=alphas[i])/niter
}

opar <- par(no.readonly=TRUE)
par(mfrow=c(1,2), pty="s")
plot(alphas, errors1, "b", main="Biased Estimator Error", 
     xlab="alpha", ylab="error", cex=0.5)
abline(a=0,b=1, lwd=1.5, col="red")
plot(alphas, errors2, "b", main="Unbiased Estimator Error", 
     xlab="alpha", ylab="error", cex=0.5)
abline(a=0,b=1, lwd=1.5, col="blue")
par(opar)

## End(Not run)

maotai documentation built on April 12, 2025, 2:10 a.m.