Nothing
#require(boot)
# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393
#' Cramer-Test for uni- and multivariate two-sample-problem
#'
#' @author Carsten Franz <carsten.franz@gmail.com>
#' @useDynLib cramer
#' @importFrom Rcpp evalCpp
#' @import boot
#' @importFrom stats fft
#' @aliases cramer-package
"_PACKAGE"
#' Perform Cramer-Test for uni- and multivariate two-sample-problem
#'
#' @description
#' Perform Cramér-test for two-sample-problem.
#'
#' Both univariate and multivariate data is possible. For calculation of the critical value
#' Monte-Carlo-bootstrap-methods and eigenvalue-methods are available. For the bootstrap
#' access ordinary and permutation methods can be chosen as well as the number
#' of bootstrap-replicates taken.
#' @details
#' The Cramér-statistic is given by
#' \deqn{T_{m,n} = \frac{mn}{m+n}\biggl(\frac{2}{mn}\sum_{i,j}^{m,n}\phi(\|\vec{X}_i-\vec{Y}_j\|^2)-\frac{1}{m^2}\sum_{i,j=1}^m\phi(\|\vec{X}_{i}-\vec{X}_{j}\|^2)}{T=mn/(m+n) ( 2/(mn) Sum[i=1..m,j=1..n] phi(||X_i-Y_j||^2) - 1/(m^2) Sum[i=1..m,j=1..m] phi(||X_i-X_j||^2) - 1/(n^2) Sum[i=1..n,j=1..n] phi(||Y_i-Y_j||^2) )}
#' \deqn{-\frac{1}{n^2}\sum_{i,j=1}^n\phi(\|\vec{Y}_{i}-\vec{Y}_{j}\|^2)\biggr),}{\code{}}
#' The function \eqn{\phi}{phi} is the kernel function mentioned in the Parameters section.
#' The proof that the Monte-Carlo-Bootstrap and eigenvalue methods work is given in the reference given below. Other build-in kernel functions are
#' \deqn{\phi_{Cramer}(z)=\sqrt{z}/2}{phiCramer(z)=z^(1/2)/2} (recommended for location alternatives),
#' \deqn{\phi_{Bahr}(z)=1-\exp(-z/2)}{phiBahr(z)=1-exp(-z/2)} (recommended for dispersion as well as location alternatives),
#' \deqn{\phi_{log}(z)=\log(1+z)}{phiLog(z)=log(1+z)} (preferrably for location alternatives),
#' \deqn{\phi_{FracA}(z)=1-\frac{1}{1+z}}{phiFracA(z)=1-1/(1+z)} (preferrably for dispersion alternatives) and
#' \deqn{\phi_{FracB}(z)=1-\frac{1}{(1+z)^2}.}{phiFracA(B)=1-1/(1+z)^2.} (also for dispersion alternatives).
#' Test performance was investigated in the below referenced 2010 publication. The idea of using this statistic is
#' due to L. Baringhaus, University of Hanover.
#' @param x First set of observations. Either in vector form (univariate) or in a matrix with
#' one observation per row (multivariate).
#' @param y Second set of observations. Same dimension as \code{x}.
#' @param conf.level Confidence level of test. The default is \code{conf.level=0.95}.
#' @param sim Type of Monte-Carlo-bootstrap method or eigenvalue method. Possible values are
#' \code{"ordinary"} (default) for normal Monte-Carlo-bootstrap, \code{"permutation"}
#' for a permutation Monte-Carlo-bootstrap or \code{"eigenvalue"} for bootstrapping
#' the limit distribution, evaluating the (approximate) eigenvalues being the weights
#' of the limiting chisquared-distribution and using the critical value of this
#' approximation (calculated via fast fourier transform). This method is especially good
#' if the dataset is too large to perform Monte-Carlo-bootstrapping (although it must not
#' be too large so the matrix eigenvalue problem can still be solved).
#' @param replicates Number of bootstrap-replicates taken to obtain critical value. The default
#' is \code{replicates=1000}. When using the eigenvalue method, this variable is unused.
#' @param maxM Gives the maximum number of points used for the fast fourier transform. When using
#' Monte-Carlo-bootstrap methods, this variable is unused.
#' @param K Gives the upper value up to which the integral for the calculation of the
#' distribution function out of the characteristic function (Gurlands formula) is
#' evaluated. The default ist 160. Careful: When increasing \code{K} it is necessary to
#' increase \code{maxM} as well since the resolution of the points where the distribution
#' function is calculated is \deqn{\frac{2\pi}{K}.}{2 PI/K.}
#' Thus, if just \code{K} is increased the maximum value, where the distribution function
#' is calculated is lower.
#' When using Monte-Carlo-bootstrap methods, this variable is unused.
#' @param just.statistic Boolean variable. If \code{TRUE} just the value of the Cram\eqn{\mbox{\'e}}{e}r-statistic
#' is calculated and no bootstrap-replicates are produced.
#' @param kernel Character-string giving the name of the kernel function. The default is \code{"phiCramer"} which
#' is the Cram\eqn{\mbox{\'e}}{e}r-test included
#' in earlier versions of this package and which is used in the paper of Baringhaus and the author mentioned
#' below. It is possible to use user-defined kernel functions here. The functions needs to be able to deal with
#' matrix arguments. Kernel functions need to be defined on
#' the positive real line with value 0 at 0 and have a non-constant completely monotone first
#' derivative. An example is show in the Examples section below. Build-in functions are \code{"phiCramer"},
#' \code{"phiBahr"}, \code{"phiLog"}, \code{"phiFracA"} and \code{"phiFracB"}.
#' @returns The returned value is an object of class \code{"cramertest"}, containing the following components:
#' \item{method}{Describing the test in words.}
#' \item{d}{Dimension of the observations.}
#' \item{m}{Number of \code{x} observations.}
#' \item{n}{Number of \code{y} observations.}
#' \item{statistic}{Value of the Cram\eqn{\mbox{\'e}}{e}r-statistic for the given observations.}
#' \item{conf.level}{Confidence level for the test.}
#' \item{crit.value}{Critical value calculated by bootstrap method, eigenvalue method, respectively. When using
#' the eigenvalue method, the distribution under the hypothesis will be interpolated linearly.}
#' \item{p.value}{Estimated p-value of the test.}
#' \item{result}{Contains \code{1} if the hypothesis of equal distributions should not be accepted and \code{0} otherwise.}
#' \item{sim}{Method used for obtaining the critical value.}
#' \item{replicates}{Number of bootstrap-replicates taken.}
#' \item{ev}{Contains eigenvalues and eigenfunctions when using the eigenvalue-method to obtain the critical value}
#' \item{hypdist}{Contains the via fft reconstructed distribution function under the hypothesis. \code{$x}
#' contains the x-values and \code{$Fx} the values of the distribution function at the positions.}
#' @section References:
#' The test and its properties is described in:
#' \itemize{
#' \item{Baringhaus, L. and Franz, C. (2004) \emph{On a new multivariate two-sample test}, Journal of Multivariate Analysis, 88, p. 190-206}
#' \item{Baringhaus, L. and Franz, C. (2010) \emph{Rigid motion invariant two-sample tests}, Statistica Sinica 20, 1333-1361}
#' }
#' The test of Bahr is also discussed in:
#' \itemize{
#' \item{Bahr, R. (1996) \emph{Ein neuer Test fuer das mehrdimensionale Zwei-Stichproben-Problem bei allgemeiner Alternative}, German, Ph.D. thesis, University of Hanover}
#' }
#' @examples
#' # comparison of two univariate normal distributions
#' x<-rnorm(20,mean=0,sd=1)
#' y<-rnorm(50,mean=0.5,sd=1)
#' cramer.test(x,y)
#'
#' # comparison of two multivariate normal distributions with permutation test:
#' # library "MASS" for multivariate routines (included in package "VR")
#'
#' # library(MASS)
#' # x<-mvrnorm(n=20,mu=c(0,0),Sigma=diag(c(1,1)))
#' # y<-mvrnorm(n=50,mu=c(0.3,0),Sigma=diag(c(1,1)))
#' # cramer.test(x,y,sim="permutation")
#'
#' # comparison of two univariate normal distributions with Bahrs Kernel
#' phiBahr<-function(x) return(1-exp(-x/2))
#' x<-rnorm(20,mean=0,sd=1)
#' y<-rnorm(50,mean=0,sd=2)
#' cramer.test(x,y,sim="eigenvalue",kernel="phiBahr")
#'
#' @export
cramer.test<-function(x,y,conf.level=0.95,replicates=1000,sim="ordinary",just.statistic=FALSE,kernel="phiCramer",maxM=2^14,K=160) {
RVAL <- list(method = paste("nonparametric Cramer-Test with kernel",kernel,"\n(on equality of two distributions)"),
d = 0,
m = 0,
n = 0,
statistic = 0,
conf.level = conf.level,
crit.value = 0,
p.value = 0,
result = 0,
sim = sim,
replicates = replicates,
hypdist = 0,
ev = 0)
if ((is.vector(x))&&(is.vector(y))) RVAL$d<-1
if ((is.matrix(x))&&(is.matrix(y))) if (ncol(x)==ncol(y)) RVAL$d<-ncol(x)
if (RVAL$d==0) stop("types of x and y incompatible or inappropriate.")
if (RVAL$d==1) {
RVAL$m<-length(x)
RVAL$n<-length(y)
daten<-matrix(c(x,y),ncol=1,byrow=TRUE)
} else {
RVAL$m<-nrow(x)
RVAL$n<-nrow(y)
daten<-matrix(c(t(x),t(y)),ncol=ncol(x),byrow=TRUE)
}
lookup<-matrix(rep(0,(RVAL$m+RVAL$n)^2),ncol=(RVAL$m+RVAL$n))
# for (i in 2:(RVAL$m+RVAL$n))
# for (j in 1:(i-1)) {
# lookup[i,j]<-sum((daten[i,]-daten[j,])^2)
# lookup[j,i]<-lookup[i,j]
# }
lookup<-calculateLookupMatrix(RVAL$m+RVAL$n,daten,lookup) # proposed by Markus Schaffer to speed up, thanks!
lookup<-eval(call(kernel,lookup))
#print(date()) #!!!
if (just.statistic) {
RVAL$statistic<-.cramer.statistic(daten,1:(RVAL$m+RVAL$n),RVAL$m,RVAL$n,lookup)
} else if (sim!="eigenvalue") {
b<-boot(data=daten,statistic=.cramer.statistic,mm=RVAL$m,nn=RVAL$n,lookup=lookup,sim=RVAL$sim,stype="i",R=RVAL$replicates)
RVAL$statistic<-b$t0
RVAL$p.value<-1-rank(c(b$t0,b$t))[1]/(replicates+1)
RVAL$crit.value<-sort(b$t)[round(RVAL$conf.level*RVAL$replicates)]
if (RVAL$statistic>RVAL$crit.value) RVAL$result<-1
RVAL$hypdist.x<-sort(b$t)
RVAL$hypdist.Fx<-(1:RVAL$replicates)/RVAL$replicates
} else {
RVAL$statistic<-.cramer.statistic(daten,1:(RVAL$m+RVAL$n),RVAL$m,RVAL$n,lookup)
N<-RVAL$m+RVAL$n
# alternative Berechnung von B (DEUTLICH schneller)
C1<-rep(0,N)
for (i in 1:N)
for (j in 1:N) C1[i]<-C1[i]+lookup[i,j]
C1<-C1/N
C2<-0
for (i in 1:N)
for (j in 1:N) C2<-C2+lookup[i,j]
C2<-C2/N^2
B<-matrix(rep(0,N^2),ncol=N)
for (i in 1:N)
for (j in 1:N) B[i,j]<-C1[i]+C1[j]-C2-lookup[i,j]
B<-B/N
RVAL$ev<-eigen(B,FALSE) # <--only.values TRUE)
kritwert<-.cramer.kritwertfft(Re((RVAL$ev)$values),conf.level,maxM,K)
RVAL$p.value<-1-(kritwert$hypdist.Fx)[which.min(abs((kritwert$hypdist.x)[1:(3*length(kritwert$hypdist.x)/4)]-RVAL$statistic))];
RVAL$crit.value<-kritwert$quantile
RVAL$hypdist.x<-kritwert$hypdist.x
RVAL$hypdist.Fx<-kritwert$hypdist.Fx
if (RVAL$statistic>RVAL$crit.value) RVAL$result<-1
}
class(RVAL) <- "cramertest"
#print(date()) #!!!
return(RVAL)
}
#' @keywords internal
# .cramer.statistic calculates the value of the test statistic T
# it is build so it can be called by "boot"
# daten is the original data set (is just given since boot wants it like that))
# indexe is the bootstrapped vector of indices (first m are observations of X, the next n of Y)
# m is the number of observations of X
# n is the number of observations of Y
# lookup is a matrix containing the distances of points i und j
.cramer.statistic<-function(daten,indexe,mm,nn,lookup) {
xind<-indexe[1:mm]
yind<-indexe[(mm+1):(mm+nn)]
mm*nn/(mm+nn)*(2*sum(lookup[xind,yind])/(mm*nn)-sum(lookup[xind,xind])/(mm^2)-sum(lookup[yind,yind])/(nn^2))
}
#' @keywords internal
.cramer.characteristicfunction<-function(lambdasquare,t) { z<--0.5*log(1-2i*lambdasquare%*%t(t));return(exp(complex(length(t),rowsum(Re(z),rep(1,length(lambdasquare))),rowsum(Im(z),rep(1,length(lambdasquare)))))) }
#' @keywords internal
.cramer.kritwertfft<-function(lambdasquare,conf.level,maxM,K) {
M<-2^11
while (150*pi*M/K^2<(2*sum(lambdasquare)+lambdasquare[1])) M<-M*2
M<-min(c(M,maxM))
goodlimit<-150*pi*M/K^2
a<-0
t<-0:(M-1)*K/M
x<-0:(M-1)*2*pi/K
#Verteilungsfunktion
t[1]<-1
h<-.cramer.characteristicfunction(lambdasquare,t)/t*exp(-a*1i*t);
h[1]<-complex(1,0,1)*sum(lambdasquare)
Fx<-1/2-Im(K/(M*pi)*fft(h,inverse=FALSE))+K/(2*M*pi)*(sum(lambdasquare)+x+a)
#quantile
xindex<-which.min(abs(Fx[1:(M/2)]-conf.level))
#linear interpolation between two evaluated points of the distribution function (much better than just taking x[xindex]
if (Fx[xindex]>conf.level) xindex<-xindex-1
if (xindex<1) xindex<-1
quantile<-x[xindex]+(conf.level-Fx[xindex])*(x[xindex+1]-x[xindex])/(Fx[xindex+1]-Fx[xindex])
#warnings?
if (Fx[M/2]<conf.level) warning("Quantile calculation discrepance. Try to increase K!")
if (quantile>goodlimit) warning("Quantile beyond good approximation limit. Try to increase maxM or decrease K!")
kritwert <- list(quantile=quantile,hypdist.x=x,hypdist.Fx=Fx)
return(kritwert)
}
#' @export
print.cramertest<-function(x,...) {
cat("\n",x$d,"-dimensional ",x$method,"\n\n")
cat("\tx-sample: ",x$m," values ")
cat("y-sample: ",x$n," values\n\n")
if (x$crit.value>0) {
cat("critical value for confidence level ",format(100 * x$conf.level),"% : ",x$crit.value,"\n")
cat("observed statistic ",x$statistic,", so that\n\t hypothesis (\"x is distributed as y\") is ")
cat(ifelse(x$result==0," ACCEPTED"," REJECTED"),".\n")
cat("estimated p-value = ",x$p.value,"\n\n")
if (x$sim!="eigenvalue") {
cat("\t[result based on ",x$replicates," ",x$sim," bootstrap-replicates]\n\n");
} else {
cat("\t[result based on eigenvalue decomposition and inverse fft]\n\n");
}
} else {
cat("observed statistic ",x$statistic,"\n\n")
}
invisible(x)
}
phiCramer<-function(x) return(sqrt(x)/2)
phiBahr<-function(x) return(1-exp(-x/2))
phiLog<-function(x) return(log(1+x))
phiFracA<-function(x) return(1-1/(1+x))
phiFracB<-function(x) return(1-1/(1+x)^2)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.