cont_ks_test: Computes the p-value for a one-sample two-sided...

View source: R/cont_ks_test.R

cont_ks_testR Documentation

Computes the p-value for a one-sample two-sided Kolmogorov-Smirnov test when the cdf under the null hypothesis is continuous

Description

Computes the p-value P(D_{n} \ge d_{n}) \equiv P(D_{n} > d_{n}), where d_{n} is the value of the KS test statistic computed based on a data sample \{x_{1}, ..., x_{n}\}, when F(x) is continuous.

Usage

cont_ks_test(x, y, ...)

Arguments

x

a numeric vector of data sample values \{x_{1}, ..., x_{n}\}.

y

a pre-specified continuous cdf, F(x) under the null hypothesis. Note that y should be a character string naming a continuous cumulative distribution function such as pexp, pnorm, etc. Only continuous cdfs are valid!

...

values of the parameters of the cdf, F(x) specified (as a character string) by y.

Details

Given a random sample \{X_{1}, ..., X_{n}\} of size n with an empirical cdf F_{n}(x), the two-sided Kolmogorov-Smirnov goodness-of-fit statistic is defined as D_{n} = \sup | F_{n}(x) - F(x) | , where F(x) is the cdf of a prespecified theoretical distribution under the null hypothesis H_{0}, that \{X_{1}, ..., X_{n}\} comes from F(x).

The function cont_ks_test implements the FFT-based algorithm proposed by Moscovich and Nadler (2017) to compute the p-value P(D_{n} \ge d_{n}), where d_{n} is the value of the KS test statistic computed based on a user provided data sample \{x_{1}, ..., x_{n}\}, assuming F(x) is continuous. This algorithm ensures a total worst-case run-time of order O(n^{2}log(n)) which makes it more efficient and numerically stable than the algorithm proposed by Marsaglia et al. (2003). The latter is used by many existing packages computing the cdf of D_{n}, e.g., the function ks.test in the package stats and the function ks.test in the package dgof. A limitation of the functions ks.test is that the sample size should be less than 100, and the computation time is O(n^{3}). In contrast, the function cont_ks_test provides results with at least 10 correct digits after the decimal point for sample sizes n up to 100000 and computation time of 16 seconds on a machine with an 2.5GHz Intel Core i5 processor with 4GB RAM, running MacOS X Yosemite. For n > 100000, accurate results can still be computed with similar accuracy, but at a higher computation time. See Dimitrova, Kaishev, Tan (2020), Appendix C for further details and examples.

Value

A list with class "htest" containing the following components:

statistic

the value of the statistic.

p.value

the p-value of the test.

alternative

"two-sided".

data.name

a character string giving the name of the data.

Source

Based on the C++ code available at https://github.com/mosco/crossing-probability developed by Moscovich and Nadler (2017). See also Dimitrova, Kaishev, Tan (2020) for more details.

References

Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan. (2020) "Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous". Journal of Statistical Software, 95(10): 1-42. doi:10.18637/jss.v095.i10.

Moscovich A., Nadler B. (2017). "Fast Calculation of Boundary Crossing Probabilities for Poisson Processes". Statistics and Probability Letters, 123, 177-182.

Examples


## Comparing the p-values obtained by stat::ks.test
## and KSgeneral::cont_ks_test

x<-abs(rnorm(100))
p.kt <- ks.test(x, "pexp", exact = TRUE)$p
p.kt_fft <- KSgeneral::cont_ks_test(x, "pexp")$p
abs(p.kt-p.kt_fft)



KSgeneral documentation built on July 26, 2023, 5:44 p.m.