KS.test: Kolmogorov-Smirnov Test
In snpar: Supplementary Non-parametric Statistics Methods

Description Usage Arguments Details Value Warning Note Author(s) References See Also Examples

Perform a Kolmogorov-Smirnov test for one sample or two samples using kernel method.

1
2
3

KS.test(x, y, ..., kernel = c("epan", "unif", "tria", 
        "quar", "triw", "tric", "gaus", "cos"), hx, hy, 
        alternative = c("two.sided", "less", "greater"))

`x`	a numeric vector of data values.
`y`	either a numeric vector of data values, or a character string naming a cumulative distribution function or an actual cumulative distribution function such as `"pnorm"`. Only continuous CDFs are valid.
`...`	parameters of the distribution specified (as a character string) by `y`.
`kernel`	a character string which determines the smoothing kernel function. TThis must be one of `"unif"` (uniform), `"tria"` (triangular), `"epan"` (epanechnikov), `"quar"` (quartic), `"triw"` (triweight), `"tric"` (tricube), `"gaus"` (gaussian) and `"cos"` (cosine). The default is `"epan"`.
`hx`	the smoothing bandwidth for `x`. See 'Details' of the default bandwidth.
`hy`	the smoothing bandwidth for `y`. See 'Details' of the default bandwidth.
`alternative`	indicates the alternative hypothesis and must be one of "`two.sided`" (default), "`less`", or "`greater`".

The traditional Kolmogorov-Smirnov test is based on the empirical cumulative distribution function (CDF) which is not continuous and may not provide good estimations to the true CDF. However, the CDF estimated by kernel method overcomes this shortcoming and generally performs much better than the empirical CDF. Namely, the kernel CDF is closer to the true CDF than the empirical CDF. Therefore, applying the kernel CDF is more reasonable than using the empirical CDF in Kolmogorov-Smirnov test. The test statistic is defined as the maximum difference in value and depends on the form of the alternative hypothesis. When the sample size is large, the test statistic has the following Kolmogorov-Smirnov distribution function:

K(x) = ∑(-1)^(j)*exp{-2*j^2*x^2}, j = - inf, ..., inf, x ≥ 0,

and K(x) = 0, x < 0. See Conover, W. J. (1999) for more details. The default smoothing bandwidth is the plug-in optimal bandwidth used in Wang, Cheng and Yang (2013). Missing values have been removed.

A list with class "htest" containing the following components:

`data.name`	a character string giving the name(s) of the data.
`statistic`	the value of the test statistic.
`p.value`	the p-value of the test.
`method`	a character string indicating what type of test was performed.
`alternative`	a character string describing the alternative hypothesis.

The smoothing bandwidth is always a critical issue in non-parametric statistics. The default smoothing bandwidth suggested by Wang, Cheng and Yang (2013) may not perform well. This only gives the initial bandwidth in some cases. You are recommended to provide one obtained by other methods.

This function only computes the p-value for large sample size. For small sample size, you can use ks.test to compute the exact p-value. Missing values have been removed.

Debin Qiu <debinqiu@uga.edu>

Conover, W. J. (1999). Practical Nonparameteric Statistics (Third Edition ed.). Wiley. pp. 396-406.

Wang, J., Cheng, F. and Yang, L. (2013). Smooth simultaneous confidence bands for cumulative distribution functions. Journal of Nonparametric Statistics. 25, 395-407.

ks.test

# one-sample Kolmogorov-Smirnov test
x <- rnorm(100,2,3)
KS.test(x, "pnorm", 2, 3)

# two-sample Kolmogorov-Smirnov test
y <- rgamma(100,1,6)
KS.test(x,y)