cont_ks_distribution: Computes the cumulative distribution function of the...

Description Usage Arguments Details Value Source References Examples

Description

Computes the cdf P(D_{n} ≤ q) \equiv P(D_{n} < q) at a fixed q, q\in[0, 1], for the one-sample two-sided Kolmogorov-Smirnov statistic, D_{n}, for a given sample size n, when the cdf F(x) under the null hypothesis is continuous.

Usage

1

Arguments

q

numeric value between 0 and 1, at which the cdf P(D_{n} ≤ q) is computed

n

the sample size

Details

Given a random sample \{X_{1}, ..., X_{n}\} of size n with an empirical cdf F_{n}(x), the Kolmogorov-Smirnov goodness-of-fit statistic is defined as D_{n} = \sup | F_{n}(x) - F(x) | , where F(x) is the cdf of a prespecified theoretical distribution under the null hypothesis H_{0}, that \{X_{1}, ..., X_{n}\} comes from F(x).

The function cont_ks_cdf implements the FFT-based algorithm proposed by Moscovich and Nadler (2017) to compute the cdf P(D_{n} ≤ q) at a value q, when F(x) is continuous. This algorithm ensures a total worst-case run-time of order O(n^{2}log(n)) which makes it more efficient and numerically stable than the algorithm proposed by Marsaglia et al. (2003). The latter is used by many existing packages computing the cdf of D_{n}, e.g., the function ks.test in the package stats and the function ks.test in the package dgof. More precisely, in these packages, the exact p-value, P(D_{n} ≥ q) is computed only in the case when q = d_{n}, where d_{n} is the value of the KS statistic computed based on a user provided sample \{x_{1}, ..., x_{n} \} . Another limitation of the functions ks.test is that the sample size should be less than 100, and the computation time is O(n^{3}). In contrast, the function cont_ks_cdf provides results with at least 10 correct digits after the decimal point for sample sizes n up to 100000 and computation time of 16 seconds on a machine with an 2.5GHz Intel Core i5 processor with 4GB RAM, running MacOS X Yosemite. For n > 100000, accurate results can still be computed with similar accuracy, but at a higher computation time. See Dimitrova, Kaishev, Tan (2020), Appendix B for further details and examples.

Value

Numeric value corresponding to P(D_{n} ≤ q).

Source

Based on the C++ code available at https://github.com/mosco/crossing-probability developed by Moscovich and Nadler (2017). See also Dimitrova, Kaishev, Tan (2020) for more details.

References

Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan. (2020) "Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous". Journal of Statistical Software, 95(10): 1-42. doi:10.18637/jss.v095.i10.

Marsaglia G., Tsang WW., Wang J. (2003). "Evaluating Kolmogorov's Distribution". Journal of Statistical Software, 8(18), 1-4.

Moscovich A., Nadler B. (2017). "Fast Calculation of Boundary Crossing Probabilities for Poisson Processes". Statistics and Probability Letters, 123, 177-182.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Compute the value for P(D_{100} <= 0.05)

KSgeneral::cont_ks_cdf(0.05, 100)


## Compute P(D_{n} <= q)
## for n = 100, q = 1/500, 2/500, ..., 500/500
## and then plot the corresponding values against q

n<-100
q<-1:500/500
plot(q, sapply(q, function(x) KSgeneral::cont_ks_cdf(x, n)), type='l')

## Compute P(D_{n} <= q) for n = 40, nq^{2} = 0.76 as shown
## in Table 9 of Dimitrova, Kaishev, Tan (2020)

KSgeneral::cont_ks_cdf(sqrt(0.76/40), 40)

KSgeneral documentation built on Jan. 13, 2021, 1:06 p.m.