unif_test: Circular and (hyper)spherical uniformity tests
In sphunif: Uniformity Tests on the Circle, Sphere, and Hypersphere

unif_test

R Documentation

Circular and (hyper)spherical uniformity tests

Description

Implementation of several uniformity tests on the (hyper)sphere S^{p-1}:=\{{\bf x}\in R^p:||{\bf x}||=1\}, p\ge 2, with calibration either in terms of their asymptotic/exact distributions, if available, or Monte Carlo.

unif_test receives a sample of directions {\bf X}_1,\ldots,{\bf X}_n\in S^{p-1} in Cartesian coordinates, except for the circular case (p=2) in which the sample can be represented in terms of angles \Theta_1,\ldots,\Theta_n\in [0, 2\pi).

unif_test allows to perform several tests within a single call, facilitating thus the exploration of a dataset by applying several tests.

Usage

unif_test(data, type = "all", p_value = "asymp", alpha = c(0.1, 0.05,
  0.01), M = 10000, stats_MC = NULL, crit_val = NULL,
  data_sorted = FALSE, K_max = 10000, method = "I", CCF09_dirs = NULL,
  CJ12_beta = 0, CJ12_reg = 3, cov_a = 2 * pi, Cressie_t = 1/3,
  K_CCF09 = 25, Poisson_rho = 0.5, Pycke_q = 0.5, Rayleigh_m = 1,
  Riesz_s = 1, Rothman_t = 1/3, Sobolev_vk2 = c(0, 0, 1),
  Softmax_kappa = 1, Stereo_a = 0, ...)

Arguments

`data`	sample to perform the test. A matrix of size `c(n, p)` containing a sample of size `n` of directions (in Cartesian coordinates) on `S^{p-1}`. Alternatively if `p = 2`, a matrix of size `c(n, 1)` containing the `n` angles on `[0, 2\pi)` of the circular sample on `S^{1}`. Other objects accepted are an array of size `c(n, p, 1)` with directions (in Cartesian coordinates), or a vector of size `n` or an array of size `c(n, 1, 1)` with angular data. Must not contain `NA`'s.
`type`	type of test to be applied. A character vector containing any of the following types of tests, depending on the dimension `p`: Circular data: any of the names available at object `avail_cir_tests`. (Hyper)spherical data: any of the names available at object `avail_sph_tests`. If `type = "all"` (default), then `type` is set as `avail_cir_tests` or `avail_sph_tests`, depending on the value of `p`.
`p_value`	type of `p`-value computation. Either `"MC"` for employing the approximation by Monte Carlo of the exact null distribution, `"asymp"` (default) for the use of the asymptotic/exact null distribution (if available), or `"crit_val"` for approximation by means of the table of critical values `crit_val`.
`alpha`	vector with significance levels. Defaults to `c(0.10, 0.05, 0.01)`.
`M`	number of Monte Carlo replications for approximating the null distribution when `approx = "MC"`. Also, number of Monte Carlo samples for approximating the asymptotic distributions based on weighted sums of chi squared random variables. Defaults to `1e4`.
`stats_MC`	a data frame of size `c(M, length(type))`, with column names containing the character vector `type`, that results from extracting `$stats_MC` from a call to `unif_stat_MC`. If provided, the computation of Monte Carlo statistics when `approx = "MC"` is skipped. `stats_MC` is checked internally to see if it is sorted. Internally computed if `NULL` (default).
`crit_val`	table with critical values for the tests, to be used if `p_value = "crit_val"`. A data frame, with column names containing the character vector `type` and rows corresponding to the significance levels `alpha`, that results from extracting `$crit_val_MC` from a call to `unif_stat_MC`. Internally computed if `NULL` (default).
`data_sorted`	is the circular data sorted? If `TRUE`, certain statistics are faster to compute. Defaults to `FALSE`.
`K_max`	integer giving the truncation of the series that compute the asymptotic p-value of a Sobolev test. Defaults to `1e4`.
`method`	method for approximating the density, distribution, or quantile function of the weighted sum of chi squared random variables. Must be `"I"` (Imhof), `"SW"` (Satterthwaite–Welch), `"HBE"` (Hall–Buckley–Eagleson), or `"MC"` (Monte Carlo; only for distribution or quantile functions). Defaults to `"I"`.
`CCF09_dirs`	a matrix of size `c(n_proj, p)` containing `n_proj` random directions (in Cartesian coordinates) on `S^{p-1}` to perform the CCF09 test. If `NULL` (default), a sample of size `n_proj = 50` directions is computed internally.
`CJ12_beta`	`\beta` parameter in the exponential regime of CJ12 test, a positive real.
`CJ12_reg`	type of asymptotic regime for CJ12 test, either `1` (sub-exponential regime), `2` (exponential), or `3` (super-exponential; default).
`cov_a`	`a_n = a / n` parameter used in the length of the arcs of the coverage-based tests. Must be positive. Defaults to `2 * pi`.
`Cressie_t`	`t` parameter for the Cressie test, a real in `(0, 1)`. Defaults to `1 / 3`.
`K_CCF09`	integer giving the truncation of the series present in the asymptotic distribution of the Kolmogorov-Smirnov statistic. Defaults to `25`.
`Poisson_rho`	`\rho` parameter for the Poisson test, a real in `[0, 1)`. Defaults to `0.5`.
`Pycke_q`	`q` parameter for the Pycke "`q`-test", a real in `(0, 1)`. Defaults to `1 / 2`.
`Rayleigh_m`	integer `m` for the `m`-modal Rayleigh test. Defaults to `m = 1` (the standard Rayleigh test).
`Riesz_s`	`s` parameter for the `s`-Riesz test, a real in `(0, 2)`. Defaults to `1`.
`Rothman_t`	`t` parameter for the Rothman test, a real in `(0, 1)`. Defaults to `1 / 3`.
`Sobolev_vk2`	weights for the finite Sobolev test. A non-negative vector or matrix. Defaults to `c(0, 0, 1)`.
`Softmax_kappa`	`\kappa` parameter for the Softmax test, a non-negative real. Defaults to `1`.
`Stereo_a`	`a` parameter for the Stereo test, a real in `[-1, 1]`. Defaults to `0`.
`...`	If `p_value = "MC"` or `p_value = "crit_val"`, optional performance parameters to be passed to `unif_stat_MC`: `chunks`, `cores`, and `seed`. If `p_value = "MC"`, additional parameters to `unif_stat_distr`.

Details

All the tests reject for large values of the test statistic, so the critical values for the significance levels alpha correspond to the alpha-upper quantiles of the null distribution of the test statistic.

When p_value = "asymp", tests that do not have an implemented or known asymptotic are omitted, and a warning is generated.

When p_value = "MC", it is possible to have a progress bar indicating the Monte Carlo simulation progress if unif_test is wrapped with progressr::with_progress or if progressr::handlers(global = TRUE) is invoked (once) by the user. See the examples below. The progress bar is updated with the number of finished chunks.

All the statistics are continuous random variables except the Hodges–Ajne statistic ("Hodges_Ajne"), the Cressie statistic ("Cressie"), and the number of (different) uncovered spacings ("Num_uncover"). These three statistics are discrete random variables.

The Monte Carlo calibration for the CCF09 test is made conditionally on the choice of
CCF09_dirs. That is, all the Monte Carlo statistics share the same random directions.

Except for CCF09_dirs, K_CCF09, and CJ12_reg, all the test-specific parameters are vectorized.

Descriptions and references for most of the tests are available in García-Portugués and Verdebout (2018).

Value

If only a single test is performed, a list with class htest containing the following components:

statistic: the value of the test statistic.
p.value: the p-value of the test. If p_value = "crit_val", an NA.
alternative: a character string describing the alternative hypothesis.
method: a character string indicating what type of test was performed.
data.name: a character string giving the name of the data.
reject: the rejection decision for the levels of significance alpha.
crit_val: a vector with the critical values for the significance levels alpha used with p_value = "MC" or p_value = "asymp".
param: parameter(s) used in the test (if any).

If several tests are performed, a type-named list with entries for each test given by the above list.

References

García-Portugués, E. and Verdebout, T. (2018) An overview of uniformity tests on the hypersphere. arXiv:1804.00286. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.1804.00286")}.

Examples

## Asymptotic distribution

# Circular data
n <- 10
samp_cir <- r_unif_cir(n = n)

# Matrix
unif_test(data = samp_cir, type = "Ajne", p_value = "asymp")

# Vector
unif_test(data = samp_cir[, 1], type = "Ajne", p_value = "asymp")

# Array
unif_test(data = array(samp_cir, dim = c(n, 1, 1)), type = "Ajne",
          p_value = "asymp")

# Several tests
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "asymp")

# Spherical data
n <- 10
samp_sph <- r_unif_sph(n = n, p = 3)

# Array
unif_test(data = samp_sph, type = "Bingham", p_value = "asymp")

# Matrix
unif_test(data = samp_sph[, , 1], type = "Bingham", p_value = "asymp")

# Several tests
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "asymp")

## Monte Carlo

# Circular data
unif_test(data = samp_cir, type = "Ajne", p_value = "MC")
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "MC")

# Spherical data
unif_test(data = samp_sph, type = "Bingham", p_value = "MC")
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC")

# Caching stats_MC
stats_MC_cir <- unif_stat_MC(n = nrow(samp_cir), p = 2)$stats_MC
stats_MC_sph <- unif_stat_MC(n = nrow(samp_sph), p = 3)$stats_MC
unif_test(data = samp_cir, type = avail_cir_tests,
          p_value = "MC", stats_MC = stats_MC_cir)
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
          stats_MC = stats_MC_sph)

## Critical values

# Circular data
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "crit_val")

# Spherical data
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "crit_val")

# Caching crit_val
crit_val_cir <- unif_stat_MC(n = n, p = 2)$crit_val_MC
crit_val_sph <- unif_stat_MC(n = n, p = 3)$crit_val_MC
unif_test(data = samp_cir, type = avail_cir_tests,
          p_value = "crit_val", crit_val = crit_val_cir)
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "crit_val",
          crit_val = crit_val_sph)

## Specific arguments

# Rothman
unif_test(data = samp_cir, type = "Rothman", Rothman_t = 0.5)

# CCF09
unif_test(data = samp_sph, type = "CCF09", p_value = "MC",
          CCF09_dirs = samp_sph[1:2, , 1])
unif_test(data = samp_sph, type = "CCF09", p_value = "MC",
          CCF09_dirs = samp_sph[3:4, , 1])

## Using a progress bar when p_value = "MC"

# Define a progress bar
require(progress)
require(progressr)
handlers(handler_progress(
  format = paste("(:spin) [:bar] :percent Iter: :current/:total Rate:",
                 ":tick_rate iter/sec ETA: :eta Elapsed: :elapsedfull"),
  clear = FALSE))

# Call unif_test() within with_progress()
with_progress(
  unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
            chunks = 10, M = 1e3)
)

# With several cores
with_progress(
  unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
            cores = 2, chunks = 10, M = 1e3)
)

# Instead of using with_progress() each time, it is more practical to run
# handlers(global = TRUE)
# once to activate progress bars in your R session

sphunif documentation built on May 29, 2024, 4:19 a.m.