mmd2test: Kernel Two-sample Test with Maximum Mean Discrepancy

View source: R/mmd2test.R

mmd2testR Documentation

Kernel Two-sample Test with Maximum Mean Discrepancy

Description

Maximum Mean Discrepancy (MMD) as a measure of discrepancy between samples is employed as a test statistic for two-sample hypothesis test of equal distributions. Kernel matrix K is a symmetric square matrix that is positive semidefinite.

Usage

mmd2test(K, label, method = c("b", "u"), mc.iter = 999)

Arguments

K

kernel matrix or an object of kernelMatrix class from kernlab package.

label

label vector of class indices.

method

type of estimator to be used. "b" for biased and "u" for unbiased estimator of MMD.

mc.iter

the number of Monte Carlo resampling iterations.

Value

a (list) object of S3 class htest containing:

statistic

a test statistic.

p.value

p-value under H_0.

alternative

alternative hypothesis.

method

name of the test.

data.name

name(s) of provided kernel matrix.

References

\insertRef

gretton_kernel_2012maotai

Examples

## small test for CRAN submission
dat1 <- matrix(rnorm(60, mean= 1), ncol=2) # group 1 : 30 obs of mean  1
dat2 <- matrix(rnorm(50, mean=-1), ncol=2) # group 2 : 25 obs of mean -1

dmat <- as.matrix(dist(rbind(dat1, dat2)))  # Euclidean distance matrix
kmat <- exp(-(dmat^2))                      # build a gaussian kernel matrix
lab  <- c(rep(1,30), rep(2,25))             # corresponding label

mmd2test(kmat, lab)                         # run the code !

## Not run: 
## WARNING: computationally heavy. 
#  Let's compute empirical Type 1 error at alpha=0.05
niter = 496  
pvals1 = rep(0,niter)
pvals2 = rep(0,niter)
for (i in 1:niter){
  dat = matrix(rnorm(200),ncol=2)
  lab = c(rep(1,50), rep(2,50))
  lbd = 0.1
  kmat = exp(-lbd*(as.matrix(dist(dat))^2))
  pvals1[i] = mmd2test(kmat, lab, method="b")$p.value
  pvals2[i] = mmd2test(kmat, lab, method="u")$p.value
  print(paste("iteration ",i," complete..",sep=""))
}

#  Visualize the above at multiple significance levels
alphas = seq(from=0.001, to=0.999, length.out=100)
errors1 = rep(0,100)
errors2 = rep(0,100)
for (i in 1:100){
   errors1[i] = sum(pvals1<=alphas[i])/niter
   errors2[i] = sum(pvals2<=alphas[i])/niter
}

opar <- par(no.readonly=TRUE)
par(mfrow=c(1,2), pty="s")
plot(alphas, errors1, "b", main="Biased Estimator Error", 
     xlab="alpha", ylab="error", cex=0.5)
abline(a=0,b=1, lwd=1.5, col="red")
plot(alphas, errors2, "b", main="Unbiased Estimator Error", 
     xlab="alpha", ylab="error", cex=0.5)
abline(a=0,b=1, lwd=1.5, col="blue")
par(opar)

## End(Not run)


maotai documentation built on March 31, 2023, 6:48 p.m.