Description Usage Arguments Details Value References See Also Examples

Computes the p-value *P(D_{n} ≥ d_{n})*, where *d_{n}* is the value of the KS test statistic computed based on a data sample *\{x_{1}, ..., x_{n}\}*, when *F(x)* is purely discrete, using the Exact-KS-FFT method expressing the p-value as a double-boundary non-crossing probability for a homogeneous Poisson process, which is then efficiently computed using FFT (see Dimitrova, Kaishev, Tan (2020)).

1 | ```
disc_ks_test(x, y, ..., exact = NULL, tol = 1e-08, sim.size = 1e+06, num.sim = 10)
``` |

`x` |
a numeric vector of data sample values |

`y` |
a pre-specified discrete cdf, |

`...` |
values of the parameters of the cdf, |

`exact` |
logical variable specifying whether one wants to compute exact p-value |

`tol` |
the value of |

`sim.size` |
the required number of simulated trajectories in order to produce one Monte Carlo estimate (one MC run) of the asymptotic p-value using the algorithm of Wood and Altavela (1978). By default, |

`num.sim` |
the number of MC runs, each producing one estimate (based on |

Given a random sample *\{X_{1}, ..., X_{n}\}* of size `n`

with an empirical cdf *F_{n}(x)*, the two-sided Kolmogorov-Smirnov goodness-of-fit statistic is defined as *D_{n} = \sup | F_{n}(x) - F(x) | *, where *F(x)* is the cdf of a prespecified theoretical distribution under the null hypothesis *H_{0}*, that *\{X_{1}, ..., X_{n}\}* comes from *F(x)*.

The function `disc_ks_test`

implements the Exact-KS-FFT method expressing the p-value as a double-boundary non-crossing probability for a homogeneous Poisson process, which is then efficiently computed using FFT (see Dimitrova, Kaishev, Tan (2020)).
It represents an accurate and fast (run time *O(n^{2}log(n))*) alternative to the function `ks.test`

from the package dgof, which computes a p-value *P(D_{n} ≥ d_{n})*, where *d_{n}* is the value of the KS test statistic computed based on a user provided data sample *\{x_{1}, ..., x_{n}\}*, assuming *F(x)* is purely discrete.

In the function `ks.test`

, the p-value for a one-sample two-sided KS test is calculated by combining the approaches of Gleser (1985) and Niederhausen (1981). However, the function `ks.test`

due to Arnold and Emerson (2011) only provides exact p-values for `n`

*≤* 30, since as noted by the authors, when `n`

is large, numerical instabilities may occur. In the latter case, `ks.test`

uses simulation to approximate p-values, which may be rather slow and inaccurate (see Table 6 of Dimitrova, Kaishev, Tan (2020)).

Thus, making use of the Exact-KS-FFT method, the function `disc_ks_test`

provides an exact and highly computationally efficient (alternative) way of computing the p-value *P(D_{n} ≥ d_{n})*, when *F(x)* is purely discrete.

Lastly, incorporated into the function `disc_ks_test`

is the MC simulation-based method of Wood and Altavela (1978) for estimating the asymptotic p-value of *D_{n}*. The latter method is the default method behind `disc_ks_test`

when the sample size `n`

is `n`

*≥* 100000.

A list with class "htest" containing the following components:

`statistic ` |
the value of the statistic. |

`p.value ` |
the p-value of the test. |

`alternative ` |
"two-sided". |

`data.name ` |
a character string giving the name of the data. |

Arnold T.A., Emerson J.W. (2011). "Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions". The R Journal, **3**(2), 34-39.

Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan. (2020) "Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous". Journal of Statistical Software, **95**(10): 1-42. doi:10.18637/jss.v095.i10.

Gleser L.J. (1985). "Exact Power of Goodness-of-Fit Tests of Kolmogorov Type for Discontinuous Distributions". Journal of the American Statistical Association, **80**(392), 954-958.

Niederhausen H. (1981). "Sheffer Polynomials for Computing Exact Kolmogorov-Smirnov and Renyi Type Distributions". The Annals of Statistics, 58-64.

Wood C.L., Altavela M.M. (1978). "Large-Sample Results for Kolmogorov-Smirnov Statistics for Discrete Distributions". Biometrika, **65**(1), 235-239.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ```
# Comparison of results obtained from dgof::ks.test
# and KSgeneral::disc_ks_test, when F(x) follows the discrete
# Uniform[1, 10] distribution as in Example 3.5 of
# Dimitrova, Kaishev, Tan (2020)
# When the sample size is larger than 100, the
# function dgof::ks.test will be numerically
# unstable
x3 <- sample(1:10, 25, replace = TRUE)
KSgeneral::disc_ks_test(x3, ecdf(1:10), exact = TRUE)
dgof::ks.test(x3, ecdf(1:10), exact = TRUE)
KSgeneral::disc_ks_test(x3, ecdf(1:10), exact = TRUE)$p -
dgof::ks.test(x3, ecdf(1:10), exact = TRUE)$p
x4 <- sample(1:10, 500, replace = TRUE)
KSgeneral::disc_ks_test(x4, ecdf(1:10), exact = TRUE)
dgof::ks.test(x4, ecdf(1:10), exact = TRUE)
KSgeneral::disc_ks_test(x4, ecdf(1:10), exact = TRUE)$p -
dgof::ks.test(x4, ecdf(1:10), exact = TRUE)$p
# Using stepfun() to specify the same discrete distribution as defined by ecdf():
steps <- stepfun(1:10, cumsum(c(0, rep(0.1, 10))))
KSgeneral::disc_ks_test(x3, steps, exact = TRUE)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.