ksgeneral-package: Computing P-Values of the K-S Test for (Dis)Continuous Null...

Description Details Author(s) References

Description

The one-sample two-sided Kolmogorov-Smirnov (KS) statistic is one of the most popular goodness-of-fit test statistics that is used to measure how well the distribution of a random sample agrees with a prespecified theoretical distribution. Given a random sample \{X_{1}, ..., X_{n}\} of size n with an empirical cdf F_{n}(x), the two-sided KS statistic is defined as D_{n} = \sup | F_{n}(x) - F(x) | , where F(x) is the cdf of the prespecified theoretical distribution under the null hypothesis H_{0}, that \{ X_{1}, ..., X_{n} \} comes from F(x).

The package KSgeneral implements a novel, accurate and efficient Fast Fourier Transform (FFT)-based method, referred as Exact-KS-FFT method to compute the complementary cdf, P(D_{n} ≥ q), at a fixed q\in[0, 1] for a given (hypothezied) purely discrete, mixed or continuous underlying cdf F(x), and arbitrary, possibly large sample size n. A plot of the complementary cdf P(D_{n} ≥ q), 0 ≤ q ≤ 1, can also be produced.

In other words, the package computes the p-value, P(D_{n} ≥ q) for any fixed critical level q\in[0, 1]. If a data sample, \{x_{1}, ..., x_{n}\} is supplied, KSgeneral computes the p-value P(D_{n} ≥ d_{n}), where d_{n} is the value of the KS test statistic computed based on \{x_{1}, ..., x_{n}\}.

Remark: The description of the package and its functions are primarily tailored to computing the (complementary) cdf of the two-sided KS statistic, D_{n}. It should be noted however that one can compute the (complementary) cdf for the one-sided KS statistics D_{n}^{-} or D_{n}^{+} (cf., Dimitrova, Kaishev, Tan (2020)) by appropriately specifying correspondingly A_{i} = 0 for all i or B_{i} = 1 for all i, in the function ks_c_cdf_Rcpp.

Details

The Exact-KS-FFT method underlying KSgeneral is based on expressing the p-value P(D_{n} ≥ q) in terms of an appropriate rectangle probability with respect to the uniform order statistics, as noted by Gleser (1985) for P(D_{n} > q). The latter representation is used to express P(D_{n} ≥ q) via a double-boundary non-crossing probability for a homogeneous Poisson process, with intensity n, which is then efficiently computed using FFT, ensuring total run-time of order O(n^{2}log(n)) (see Dimitrova, Kaishev, Tan (2020) and also Moscovich and Nadler (2017) for the special case when F(x) is continuous).

KSgeneral represents an R wrapper of the original C++ code due to Dimitrova, Kaishev, Tan (2020) and based on the C++ code developed by Moscovich and Nadler (2017). The package includes the functions disc_ks_c_cdf, mixed_ks_c_cdf and cont_ks_c_cdf that compute the complementary cdf P(D_n ≥ q), for a fixed q, 0 ≤ q ≤ 1, when F(x) is purely discrete, mixed or continuous, respectively. KSgeneral includes also the functions disc_ks_test, mixed_ks_test and cont_ks_test that compute the p-value P(D_{n} ≥ d_{n}), where d_{n} is the value of the KS test statistic computed based on a user provided data sample \{x_{1}, ..., x_{n}\}, when F(x) is purely discrete, mixed or continuous, respectively.

The functions disc_ks_test and cont_ks_test represent accurate and fast (run time O(n^{2}log(n))) alternatives to the functions ks.test from the package dgof and the function ks.test from the package stat, which compute p-values of P(D_{n} ≥ d_{n}), assuming F(x) is purely discrete or continuous, respectively.

The package also includes the function ks_c_cdf_Rcpp which gives the flexibility to compute the complementary cdf (p-value) for the one-sided KS test statistics D_{n}^{-} or D_{n}^{+}. It also allows for faster computation time and possibly higher accuracy in computing P(D_{n} ≥ q).

Author(s)

Dimitrina S. Dimitrova <D.Dimitrova@city.ac.uk>, Vladimir K. Kaishev <Vladimir.Kaishev.1@city.ac.uk> and Senren Tan <senren.tan@cass.city.ac.uk>

Maintainer: Senren Tan <senren.tan@cass.city.ac.uk>

References

Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan. (2020) "Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous". Journal of Statistical Software, 95(10): 1-42. doi:10.18637/jss.v095.i10.

Gleser L.J. (1985). "Exact Power of Goodness-of-Fit Tests of Kolmogorov Type for Discontinuous Distributions". Journal of the American Statistical Association, 80(392), 954-958.

Moscovich A., Nadler B. (2017). "Fast Calculation of Boundary Crossing Probabilities for Poisson Processes". Statistics and Probability Letters, 123, 177-182.


KSgeneral documentation built on Jan. 13, 2021, 1:06 p.m.