Description Details Author(s) References

The one-sample two-sided Kolmogorov-Smirnov (KS) statistic is one of the most popular goodness-of-fit test statistics that is used to measure how well the distribution of a random sample agrees with a prespecified theoretical distribution.
Given a random sample *\{X_{1}, ..., X_{n}\}* of size *n* with an empirical cdf *F_{n}(x)*, the two-sided KS statistic is defined as
*D_{n} = \sup | F_{n}(x) - F(x) | *, where *F(x)* is the cdf of the prespecified theoretical distribution under the null hypothesis *H_{0}*, that * \{ X_{1}, ..., X_{n} \} * comes from *F(x)*.

The package KSgeneral implements a novel, accurate and efficient Fast Fourier Transform (FFT)-based method, referred as Exact-KS-FFT method to compute the complementary cdf,
*P(D_{n} ≥ q)*, at a fixed *q\in[0, 1]* for a given (hypothezied) purely discrete, mixed or continuous underlying cdf *F(x)*, and arbitrary, possibly large sample size *n*.
A plot of the complementary cdf *P(D_{n} ≥ q)*, *0 ≤ q ≤ 1*, can also be produced.

In other words, the package computes the p-value, *P(D_{n} ≥ q)* for any fixed critical level *q\in[0, 1]*.
If a data sample, *\{x_{1}, ..., x_{n}\}* is supplied, KSgeneral computes the p-value *P(D_{n} ≥ d_{n})*, where *d_{n}* is the value of the KS test statistic computed based on *\{x_{1}, ..., x_{n}\}*.

Remark: The description of the package and its functions are primarily tailored to computing the (complementary) cdf of the two-sided KS statistic, *D_{n}*.
It should be noted however that one can compute the (complementary) cdf for the one-sided KS statistics *D_{n}^{-}* or *D_{n}^{+}* (cf., Dimitrova, Kaishev, Tan (2020)) by appropriately specifying correspondingly *A_{i} = 0* for all *i* or *B_{i} = 1* for all *i*, in the function `ks_c_cdf_Rcpp`

.

The Exact-KS-FFT method underlying KSgeneral is based on expressing the p-value *P(D_{n} ≥ q)* in terms of an appropriate rectangle probability with respect to the uniform order statistics, as noted by Gleser (1985) for *P(D_{n} > q)*.
The latter representation is used to express *P(D_{n} ≥ q)* via a double-boundary non-crossing probability for a homogeneous Poisson process, with intensity *n*, which is then efficiently computed using FFT, ensuring total run-time of order *O(n^{2}log(n))* (see Dimitrova, Kaishev, Tan (2020) and also Moscovich and Nadler (2017) for the special case when *F(x)* is continuous).

KSgeneral represents an R wrapper of the original C++ code due to Dimitrova, Kaishev, Tan (2020) and based on the C++ code developed by Moscovich and Nadler (2017).
The package includes the functions `disc_ks_c_cdf`

, `mixed_ks_c_cdf`

and `cont_ks_c_cdf`

that compute the complementary cdf *P(D_n ≥ q)*, for a fixed *q*, *0 ≤ q ≤ 1*, when *F(x)* is purely discrete, mixed or continuous, respectively.
KSgeneral includes also the functions `disc_ks_test`

, `mixed_ks_test`

and `cont_ks_test`

that compute the p-value *P(D_{n} ≥ d_{n})*, where *d_{n}* is the value of the KS test statistic computed based on a user provided data sample *\{x_{1}, ..., x_{n}\}*, when *F(x)* is purely discrete, mixed or continuous, respectively.

The functions `disc_ks_test`

and `cont_ks_test`

represent accurate and fast (run time *O(n^{2}log(n))*) alternatives to the functions `ks.test`

from the package dgof and the function `ks.test`

from the package stat, which compute p-values of *P(D_{n} ≥ d_{n})*, assuming *F(x)* is purely discrete or continuous, respectively.

The package also includes the function `ks_c_cdf_Rcpp`

which gives the flexibility to compute the complementary cdf (p-value) for the one-sided KS test statistics *D_{n}^{-}* or *D_{n}^{+}*.
It also allows for faster computation time and possibly higher accuracy in computing *P(D_{n} ≥ q)*.

Dimitrina S. Dimitrova <D.Dimitrova@city.ac.uk>, Vladimir K. Kaishev <Vladimir.Kaishev.1@city.ac.uk> and Senren Tan <senren.tan@cass.city.ac.uk>

Maintainer: Senren Tan <senren.tan@cass.city.ac.uk>

Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan. (2020) "Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous". Journal of Statistical Software, **95**(10): 1-42. doi:10.18637/jss.v095.i10.

Gleser L.J. (1985). "Exact Power of Goodness-of-Fit Tests of Kolmogorov Type for Discontinuous Distributions". Journal of the American Statistical Association, **80**(392), 954-958.

Moscovich A., Nadler B. (2017). "Fast Calculation of Boundary Crossing Probabilities for Poisson Processes". Statistics and Probability Letters, **123**, 177-182.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.