# mmd2test: Kernel Two-sample Test with Maximum Mean Discrepancy In maotai: Tools for Matrix Algebra, Optimization and Inference

## Description

Maximum Mean Discrepancy (MMD) as a measure of discrepancy between samples is employed as a test statistic for two-sample hypothesis test of equal distributions. Kernel matrix K is a symmetric square matrix that is positive semidefinite.

## Usage

 `1` ```mmd2test(K, label, method = c("b", "u"), mc.iter = 999) ```

## Arguments

 `K` kernel matrix or an object of `kernelMatrix` class from kernlab package. `label` label vector of class indices. `method` type of estimator to be used. `"b"` for biased and `"u"` for unbiased estimator of MMD. `mc.iter` the number of Monte Carlo resampling iterations.

## Value

a (list) object of `S3` class `htest` containing:

statistic

a test statistic.

p.value

p-value under H_0.

alternative

alternative hypothesis.

method

name of the test.

data.name

name(s) of provided kernel matrix.

## References

\insertRef

gretton_kernel_2012maotai

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46``` ```## small test for CRAN submission dat1 <- matrix(rnorm(60, mean= 1), ncol=2) # group 1 : 30 obs of mean 1 dat2 <- matrix(rnorm(50, mean=-1), ncol=2) # group 2 : 25 obs of mean -1 dmat <- as.matrix(dist(rbind(dat1, dat2))) # Euclidean distance matrix kmat <- exp(-(dmat^2)) # build a gaussian kernel matrix lab <- c(rep(1,30), rep(2,25)) # corresponding label mmd2test(kmat, lab) # run the code ! ## Not run: ## WARNING: computationally heavy. # Let's compute empirical Type 1 error at alpha=0.05 niter = 496 pvals1 = rep(0,niter) pvals2 = rep(0,niter) for (i in 1:niter){ dat = matrix(rnorm(200),ncol=2) lab = c(rep(1,50), rep(2,50)) lbd = 0.1 kmat = exp(-lbd*(as.matrix(dist(dat))^2)) pvals1[i] = mmd2test(kmat, lab, method="b")\$p.value pvals2[i] = mmd2test(kmat, lab, method="u")\$p.value print(paste("iteration ",i," complete..",sep="")) } # Visualize the above at multiple significance levels alphas = seq(from=0.001, to=0.999, length.out=100) errors1 = rep(0,100) errors2 = rep(0,100) for (i in 1:100){ errors1[i] = sum(pvals1<=alphas[i])/niter errors2[i] = sum(pvals2<=alphas[i])/niter } opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") plot(alphas, errors1, "b", main="Biased Estimator Error", xlab="alpha", ylab="error", cex=0.5) abline(a=0,b=1, lwd=1.5, col="red") plot(alphas, errors2, "b", main="Unbiased Estimator Error", xlab="alpha", ylab="error", cex=0.5) abline(a=0,b=1, lwd=1.5, col="blue") par(opar) ## End(Not run) ```

maotai documentation built on Oct. 25, 2021, 9:06 a.m.