Description Usage Arguments Details Value References Examples

Function calling directly the C++ routines that compute the complementary cdf for the one-sample two-sided Kolmogorov-Smirnov statistic, given the sample size `n`

and the file "Boundary_Crossing_Time.txt" in the working directory.
The latter file contains *A_{i}* and *B_{i}*, *i = 1, ..., n*, specified in Steps 1 and 2 of the Exact-KS-FFT method (see Equation (5) in Section 2 of Dimitrova, Kaishev, Tan (2020)).
The latter values form the n-dimensional rectangular region for the uniform order statistics (see Equations (3), (5) and (6) in Dimitrova, Kaishev, Tan (2020)), namely
*P(D_{n}≥ q) = 1 - P(A_{i} ≤ U_{(i)} ≤ B_{i}, 1 ≤ i ≤ n) = 1 - P(g(t) ≤ nU_{n}(t) ≤ h(t), 0 ≤ t ≤ 1)*,
where the upper and lower boundary functions *h(t)*, *g(t)* are defined as
*h(t) = ∑_{i=1}^{n}1_{(A_{i} < t)}*, *g(t) = ∑_{i=1}^{n}1_{(B_{i} ≤ t)}*,
or equivalently, noting that *h(t)* and *g(t)* are correspondingly left and right continuous functions, we have
*\sup\{t\in[0,1]: h(t) < i \} = A_{i}* and *\inf\{t\in[0,1]: g(t) > i-1 \} = B_{i}*.

Note that on can also compute the (complementary) cdf for the one-sided KS statistics *D_{n}^{-}* or *D_{n}^{+}* (cf., Dimitrova, Kaishev, Tan (2020)) by appropriately specifying correspondingly *A_{i} = 0* for all *i* or *B_{i} = 1* for all *i*, in the function `ks_c_cdf_Rcpp`

.

1 |

`n` |
the sample size |

Note that all calculations here are done directly in C++ and output in R.
That leads to faster computation time, as well as in some cases, possibly higher accuracy (depending on the accuracy of the pre-computed values *A_{i}* and *B_{i}*, *i = 1, ..., n*, provided in the file "Boundary_Crossing_Time.txt") compared to the functions `cont_ks_c_cdf`

, `disc_ks_c_cdf`

, `mixed_ks_c_cdf`

.

Given a random sample *\{X_{1}, ..., X_{n}\}* of size `n`

with an empirical cdf *F_{n}(x)*, the two-sided Kolmogorov-Smirnov goodness-of-fit statistic is defined as *D_{n} = \sup | F_{n}(x) - F(x) | *, where *F(x)* is the cdf of a prespecified theoretical distribution under the null hypothesis *H_{0}*, that *\{X_{1}, ..., X_{n}\}* comes from *F(x)*.
The one-sided KS test statistics are correspondingly defined as *D_{n}^{-} = \sup_{x} (F(x) - F_{n}(x))* and *D_{n}^{+} = \sup_{x} (F_{n}(x) - F(x))*.

The function `ks_c_cdf_Rcpp`

implements the Exact-KS-FFT method, proposed by Dimitrova, Kaishev, Tan (2020), to compute the complementary cdf, *P(D_{n} ≥ q)* at a value *q*, when *F(x)* is arbitrary (i.e. purely discrete, mixed or continuous).
It is based on expressing the complementary cdf as
*P(D_{n} ≥ q) = 1 - P(A_{i} ≤ U_{(i)} ≤ B_{i}, 1 ≤ i ≤ n)*, where *A_{i}* and *B_{i}* are defined as in Step 1 of Dimitrova, Kaishev, Tan (2020).

The complementary cdf is then re-expressed in terms of the conditional probability that a homogeneous Poisson process, *ξ_{n}(t)* with intensity *n* will not cross an upper boundary *h(t)* and a lower boundary *g(t)*, given that *ξ_{n}(1) = n* (see Steps 2 and 3 in Section 2.1 of Dimitrova, Kaishev, Tan (2020)). This conditional probability is evaluated using FFT in Step 4 of the method in order to obtain the value of the complementary cdf *P(D_{n} ≥ q)*.
This algorithm ensures a total worst-case run-time of order *O(n^{2}log(n))* which makes it highly computationally efficient compared to other known algorithms developed for the special cases of continuous or purely discrete *F(x)*.

The values *A_{i}* and *B_{i}*, *i = 1, ..., n*, specified in Steps 1 and 2 of the Exact-KS-FFT method (see Dimitrova, Kaishev, Tan (2020), Section 2) must be pre-computed (in R or, if needed, using alternative softwares offering high accuracy, e.g. Mathematica) and saved in a file with the name "Boundary_Crossing_Time.txt" (in the current working directory).

The function `ks_c_cdf_Rcpp`

is called in R and it first reads the file "Boundary_Crossing_Time.txt" and then computes the value for the complementaty cdf
*P(D_{n}≥ q) = 1 - P(A_{i} ≤ U_{(i)} ≤ B_{i}, 1 ≤ i ≤ n) = 1 - P(g(t) ≤ nU_{n}(t) ≤ h(t), 0 ≤ t ≤ 1)* in C++ and output in R (or as noted above, as a special case, computes the value of the complementary cdf *P(D_{n}^{+} ≥ q) = 1 - P(A_{i} ≤ U_{(i)} ≤ 1, 1 ≤ i ≤ n)* or *P(D_{n}^{-} ≥ q) = 1 - P(0 ≤ U_{(i)} ≤ B_{i}, 1 ≤ i ≤ n)*).

Numeric value corresponding to *P(D_{n}≥ q) = 1 - P(A_{i} ≤ U_{(i)} ≤ B_{i}, 1 ≤ i ≤ n) = 1 - P(g(t) ≤ η_{n}(t) ≤ h(t), 0 ≤ t ≤ 1)* (or, as a special case, to *P(D_{n}^{+} ≥ q)* or *P(D_{n}^{-} ≥ q)*), given a sample size `n`

and the file "Boundary_Crossing_Time.txt" containing *A_{i}* and *B_{i}*, *i = 1, ..., n*, specified in Steps 1 and 2 of the Exact-KS-FFT method (see Dimitrova, Kaishev, Tan (2020), Section 2).

Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan. (2020) "Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous". Journal of Statistical Software, **95**(10): 1-42. doi:10.18637/jss.v095.i10.

Moscovich A., Nadler B. (2017). "Fast Calculation of Boundary Crossing Probabilities for Poisson Processes". Statistics and Probability Letters, **123**, 177-182.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ```
## Computing the complementary cdf P(D_{n} >= q)
## for n = 10 and q = 0.1, when F(x) is continuous,
## In this case,
## B_i = (i-1)/n + q
## A_i = i/n - q
n <- 10
q <- 0.1
up_rec <- ((1:n)-1)/n + q
low_rec <- (1:n)/n - q
df <- data.frame(rbind(up_rec, low_rec))
write.table(df,"Boundary_Crossing_Time.txt", sep = ", ",
row.names = FALSE, col.names = FALSE)
ks_c_cdf_Rcpp(n)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.