KSgeneral-package | R Documentation |
The one-sample two-sided Kolmogorov-Smirnov (KS) statistic is one of the most popular goodness-of-fit test statistics that is used to measure how well the distribution of a random sample agrees with a prespecified theoretical distribution.
Given a random sample \{X_{1}, ..., X_{n}\}
of size n
with an empirical cdf F_{n}(x)
, the two-sided KS statistic is defined as
D_{n} = \sup | F_{n}(x) - F(x) |
, where F(x)
is the cdf of the prespecified theoretical distribution under the null hypothesis H_{0}
, that \{ X_{1}, ..., X_{n} \}
comes from F(x)
.
The package KSgeneral implements a novel, accurate and efficient Fast Fourier Transform (FFT)-based method, referred as Exact-KS-FFT method to compute the complementary cdf,
P(D_{n} \ge q)
, at a fixed q\in[0, 1]
for a given (hypothezied) purely discrete, mixed or continuous underlying cdf F(x)
, and arbitrary, possibly large sample size n
.
A plot of the complementary cdf P(D_{n} \ge q)
, 0 \le q \le 1
, can also be produced.
In other words, the package computes the p-value, P(D_{n} \ge q)
for any fixed critical level q\in[0, 1]
.
If a data sample, \{x_{1}, ..., x_{n}\}
is supplied, KSgeneral computes the p-value P(D_{n} \ge d_{n})
, where d_{n}
is the value of the KS test statistic computed based on \{x_{1}, ..., x_{n}\}
.
Remark: The description of the package and its functions are primarily tailored to computing the (complementary) cdf of the two-sided KS statistic, D_{n}
.
It should be noted however that one can compute the (complementary) cdf for the one-sided KS statistics D_{n}^{-}
or D_{n}^{+}
(cf., Dimitrova, Kaishev, Tan (2020)) by appropriately specifying correspondingly A_{i} = 0
for all i
or B_{i} = 1
for all i
, in the function ks_c_cdf_Rcpp
.
The Exact-KS-FFT method underlying KSgeneral is based on expressing the p-value P(D_{n} \ge q)
in terms of an appropriate rectangle probability with respect to the uniform order statistics, as noted by Gleser (1985) for P(D_{n} > q)
.
The latter representation is used to express P(D_{n} \ge q)
via a double-boundary non-crossing probability for a homogeneous Poisson process, with intensity n
, which is then efficiently computed using FFT, ensuring total run-time of order O(n^{2}log(n))
(see Dimitrova, Kaishev, Tan (2020) and also Moscovich and Nadler (2017) for the special case when F(x)
is continuous).
KSgeneral represents an R wrapper of the original C++ code due to Dimitrova, Kaishev, Tan (2020) and based on the C++ code developed by Moscovich and Nadler (2017).
The package includes the functions disc_ks_c_cdf
, mixed_ks_c_cdf
and cont_ks_c_cdf
that compute the complementary cdf P(D_n \ge q)
, for a fixed q
, 0 \le q \le 1
, when F(x)
is purely discrete, mixed or continuous, respectively.
KSgeneral includes also the functions disc_ks_test
, mixed_ks_test
and cont_ks_test
that compute the p-value P(D_{n} \ge d_{n})
, where d_{n}
is the value of the KS test statistic computed based on a user provided data sample \{x_{1}, ..., x_{n}\}
, when F(x)
is purely discrete, mixed or continuous, respectively.
The functions disc_ks_test
and cont_ks_test
represent accurate and fast (run time O(n^{2}log(n))
) alternatives to the functions ks.test
from the package dgof and the function ks.test
from the package stat, which compute p-values of P(D_{n} \ge d_{n})
, assuming F(x)
is purely discrete or continuous, respectively.
The package also includes the function ks_c_cdf_Rcpp
which gives the flexibility to compute the complementary cdf (p-value) for the one-sided KS test statistics D_{n}^{-}
or D_{n}^{+}
.
It also allows for faster computation time and possibly higher accuracy in computing P(D_{n} \ge q)
.
Dimitrina S. Dimitrova <D.Dimitrova@city.ac.uk>, Vladimir K. Kaishev <Vladimir.Kaishev.1@city.ac.uk> and Senren Tan <raymondtsrtsr@outlook.com>
Maintainer: Senren Tan <raymondtsrtsr@outlook.com>
Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan. (2020) "Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous". Journal of Statistical Software, 95(10): 1-42. doi:10.18637/jss.v095.i10.
Gleser L.J. (1985). "Exact Power of Goodness-of-Fit Tests of Kolmogorov Type for Discontinuous Distributions". Journal of the American Statistical Association, 80(392), 954-958.
Moscovich A., Nadler B. (2017). "Fast Calculation of Boundary Crossing Probabilities for Poisson Processes". Statistics and Probability Letters, 123, 177-182.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.