Cramer-Test for uni- and multivariate two-sample-problem

Share:

Description

Perform Cramer-test for two-sample-problem. Both univariate and multivariate data is possible. For calculation of the critical value Monte-Carlo-bootstrap-methods and eigenvalue-methods are available. For the bootstrap access ordinary and permutation methods can be chosen as well as the number of bootstrap-replicates taken.

Usage

1
2
3
cramer.test(x,y,conf.level=0.95,replicates=1000,
            sim="ordinary",just.statistic=FALSE,
            kernel="phiCramer", maxM=2^14, K=160) 

Arguments

x

First set of observations. Either in vector form (univariate) or in a matrix with one observation per row (multivariate).

y

Second set of observations. Same dimension as x.

conf.level

Confidence level of test. The default is conf.level=0.95.

sim

Type of Monte-Carlo-bootstrap method or eigenvalue method. Possible values are "ordinary" (default) for normal Monte-Carlo-bootstrap, "permutation" for a permutation Monte-Carlo-bootstrap or "eigenvalue" for bootstrapping the limit distribution, evaluating the (approximate) eigenvalues being the weights of the limiting chisquared-distribution and using the critical value of this approximation (calculated via fast fourier transform). This method is especially good if the dataset is too large to perform Monte-Carlo-bootstrapping (although it must not be too large so the matrix eigenvalue problem can still be solved).

replicates

Number of bootstrap-replicates taken to obtain critical value. The default is replicates=1000. When using the eigenvalue method, this variable is unused.

maxM

Gives the maximum number of points used for the fast fourier transform. When using Monte-Carlo-bootstrap methods, this variable is unused.

K

Gives the upper value up to which the integral for the calculation of the distribution function out of the characteristic function (Gurlands formula) is evaluated. The default ist 160. Careful: When increasing K it is necessary to increase maxM as well since the resolution of the points where the distribution function is calculated is

2 PI/K.

Thus, if just K is increased the maximum value, where the distribution function is calculated is lower. When using Monte-Carlo-bootstrap methods, this variable is unused.

just.statistic

Boolean variable. If TRUE just the value of the Cramer-statistic is calculated and no bootstrap-replicates are produced.

kernel

Character-string giving the name of the kernel function. The default is "phiCramer" which is the Cramer-test included in earlier versions of this package and which is used in the paper of Baringhaus and the author mentioned below. It is possible to use user-defined kernel functions here. The functions needs to be able to deal with matrix arguments. Kernel functions need to be defined on the positive real line with value 0 at 0 and have a nonconstant completely monotone first derivative. An example is show in the Examples section below. Build-in functions are "phiCramer", "phiBahr", "phiLog", "phiFracA" and "phiFracB".

Details

The Cramer-statistic is given by

T=mn/(m+n) ( 2/(mn) Sum[i=1..m,j=1..n] phi(||X_i-Y_j||^2) - 1/(m^2) Sum[i=1..m,j=1..m] phi(||X_i-X_j||^2) - 1/(n^2) Sum[i=1..n,j=1..n] phi(||Y_i-Y_j||^2) )

\code{}

The function phi is the kernel function mentioned in the Parameters section. The proof that the Monte-Carlo-Bootstrap and eigenvalue methods work is given in the reference listed below. Other build-in kernel functions are

phiCramer(z)=z^(1/2)/2

(recommended for location alternatives),

phiBahr(z)=1-exp(-z/2)

(recommended for dispersion as well as location alternatives),

phiLog(z)=log(1+z)

(preferrably for location alternatives),

phiFracA(z)=1-1/(1+z)

(preferrably for dispersion alternatives) and

phiFracA(B)=1-1/(1+z)^2.

(also for dispersion alternatives). A further analysis of the test performance for these kernels will be included in a further publication. The idea of using this statistic is due to L. Baringhaus, University of Hanover.

Value

The returned value is an object of class "cramertest", containing the following components:

method

Describing the test in words.

d

Dimension of the observations.

m

Number of x observations.

n

Number of y observations.

statistic

Value of the Cramer-statistic for the given observations.

conf.level

Confidence level for the test.

crit.value

Critical value calculated by bootstrap method, eigenvalue method, respectively. When using the eigenvalue method, the distribution under the hypothesis will be interpolated linearly.

p.value

Estimated p-value of the test.

result

Contains 1 if the hypothesis of equal distributions should not be accepted and 0 otherwise.

sim

Method used for obtaining the critical value.

replicates

Number of bootstrap-replicates taken.

ev

Contains eigenvalues and eigenfunctions when using the eigenvalue-method to obtain the critical value

hypdist

Contains the via fft reconstructed distribution function under the hypothesis. $x contains the x-values and $Fx the values of the distribution function at the positions.

References

The test and its properties is described in:

Baringhaus, L. and Franz, C. (2004) On a new multivariate two-sample test, Journal of Multivariate Analysis, 88, p. 190-206

Franz, C. (2000) Ein statistischer Test fuer das mehrdimensionale Zweistichproben-Problem, German, Diploma thesis, University of Hanover.

The test of Bahr so far is only mentioned in:

Bahr, R. (1996) Ein neuer Test fuer das mehrdimensionale Zwei-Stichproben-Problem bei allgemeiner Alternative, German, Ph.D. thesis, University of Hanover.

The eigenvalue method will be described in a forthcoming article.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# comparison of two univariate normal distributions
x<-rnorm(20,mean=0,sd=1)
y<-rnorm(50,mean=0.5,sd=1)
cramer.test(x,y)

# comparison of two multivariate normal distributions with permutation test:
# library "MASS" for multivariate routines (included in package "VR")
 
# library(MASS)
# x<-mvrnorm(n=20,mu=c(0,0),Sigma=diag(c(1,1)))
# y<-mvrnorm(n=50,mu=c(0.3,0),Sigma=diag(c(1,1)))
# cramer.test(x,y,sim="permutation")

# comparison of two univariate normal distributions with Bahrs Kernel
phiBahr<-function(x) return(1-exp(-x/2))
x<-rnorm(20,mean=0,sd=1)
y<-rnorm(50,mean=0,sd=2)
cramer.test(x,y,sim="eigenvalue",kernel="phiBahr")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.