unif_test: Circular and (hyper)spherical uniformity tests

View source: R/unif_test.R

unif_testR Documentation

Circular and (hyper)spherical uniformity tests

Description

Implementation of several uniformity tests on the (hyper)sphere S^{p-1}:=\{{\bf x}\in R^p:||{\bf x}||=1\}, p\ge 2, with calibration either in terms of their asymptotic/exact distributions, if available, or Monte Carlo.

unif_test receives a sample of directions {\bf X}_1,\ldots,{\bf X}_n\in S^{p-1} in Cartesian coordinates, except for the circular case (p=2) in which the sample can be represented in terms of angles \Theta_1,\ldots,\Theta_n\in [0, 2\pi).

unif_test allows to perform several tests within a single call, facilitating thus the exploration of a dataset by applying several tests.

Usage

unif_test(data, type = "all", p_value = "asymp", alpha = c(0.1, 0.05,
  0.01), M = 10000, stats_MC = NULL, crit_val = NULL,
  data_sorted = FALSE, Rayleigh_m = 1, cov_a = 2 * pi, Rothman_t = 1/3,
  Cressie_t = 1/3, Pycke_q = 0.5, Riesz_s = 1, CCF09_dirs = NULL,
  K_CCF09 = 25, CJ12_reg = 3, CJ12_beta = 0, K_max = 10000, ...)

Arguments

data

sample to perform the test. A matrix of size c(n, p) containing a sample of size n of directions (in Cartesian coordinates) on S^{p-1}. Alternatively if p = 2, a matrix of size c(n, 1) containing the n angles on [0, 2\pi) of the circular sample on S^{1}. Other objects accepted are an array of size c(n, p, 1) with directions (in Cartesian coordinates), or a vector of size n or an array of size c(n, 1, 1) with angular data. Must not contain NA's.

type

type of test to be applied. A character vector containing any of the following types of tests, depending on the dimension p:

  • Circular data: any of the names available at object avail_cir_tests.

  • (Hyper)spherical data: any of the names available at object avail_sph_tests.

If type = "all" (default), then type is set as avail_cir_tests or avail_sph_tests, depending on the value of p.

p_value

type of p-value computation. Either "MC" for employing the approximation by Monte Carlo of the exact null distribution, "asymp" (default) for the use of the asymptotic/exact null distribution (if available), or "crit_val" for approximation by means of the table of critical values crit_val.

alpha

vector with significance levels. Defaults to c(0.10, 0.05, 0.01).

M

number of Monte Carlo replications for approximating the null distribution when approx = "MC". Also, number of Monte Carlo samples for approximating the asymptotic distributions based on weighted sums of chi squared random variables. Defaults to 1e4.

stats_MC

a data frame of size c(M, length(type)), with column names containing the character vector type, that results from extracting $stats_MC from a call to unif_stat_MC. If provided, the computation of Monte Carlo statistics when approx = "MC" is skipped. stats_MC is checked internally to see if it is sorted. Internally computed if NULL (default).

crit_val

table with critical values for the tests, to be used if p_value = "crit_val". A data frame, with column names containing the character vector type and rows corresponding to the significance levels alpha, that results from extracting $crit_val_MC from a call to unif_stat_MC. Internally computed if NULL (default).

data_sorted

is the circular data sorted? If TRUE, certain statistics are faster to compute. Defaults to FALSE.

Rayleigh_m

integer m for the m-modal Rayleigh test. Defaults to m = 1 (the standard Rayleigh test).

cov_a

a_n = a / n parameter used in the length of the arcs of the coverage-based tests. Must be positive. Defaults to 2 * pi.

Rothman_t

t parameter for the Rothman test, a real in (0, 1). Defaults to 1 / 3.

Cressie_t

t parameter for the Cressie test, a real in (0, 1). Defaults to 1 / 3.

Pycke_q

q parameter for the Pycke "q-test", a real in (0, 1). Defaults to 1 / 2.

Riesz_s

s parameter for the s-Riesz test, a real in (0, 2). Defaults to 1.

CCF09_dirs

a matrix of size c(n_proj, p) containing n_proj random directions (in Cartesian coordinates) on S^{p-1} to perform the CCF09 test. If NULL (default), a sample of size n_proj = 50 directions is computed internally.

K_CCF09

integer giving the truncation of the series present in the asymptotic distribution of the Kolmogorov-Smirnov statistic. Defaults to 5e2.

CJ12_reg

type of asymptotic regime for CJ12 test, either 1 (sub-exponential regime), 2 (exponential), or 3 (super-exponential; default).

CJ12_beta

\beta parameter in the exponential regime of CJ12 test, a positive real.

K_max

integer giving the truncation of the series that compute the asymptotic p-value of a Sobolev test. Defaults to 1e4.

...

If p_value = "MC" or p_value = "crit_val", optional performance parameters to be passed to unif_stat_MC: chunks, cores, and seed.

Details

All the tests reject for large values of the test statistic, so the critical values for the significance levels alpha correspond to the alpha-upper quantiles of the null distribution of the test statistic.

When p_value = "asymp", tests that do not have an implemented or known asymptotic are omitted, and a warning is generated.

When p_value = "MC", it is possible to have a progress bar indicating the Monte Carlo simulation progress if unif_test is wrapped with progressr::with_progress or if progressr::handlers(global = TRUE) is invoked (once) by the user. See the examples below. The progress bar is updated with the number of finished chunks.

All the statistics are continuous random variables except the Hodges–Ajne statistic ("Hodges_Ajne"), the Cressie statistic ("Cressie"), and the number of (different) uncovered spacings ("Num_uncover"). These three statistics are discrete random variables.

The Monte Carlo calibration for the CCF09 test is made conditionally on the choice of
CCF09_dirs. That is, all the Monte Carlo statistics share the same random directions.

Descriptions and references for most of the tests are available in García-Portugués and Verdebout (2018).

Value

If only a single test is performed, a list with class htest containing the following components:

  • statistic: the value of the test statistic.

  • p.value: the p-value of the test. If p_value = "crit_val", an NA.

  • alternative: a character string describing the alternative hypothesis.

  • method: a character string indicating what type of test was performed.

  • data.name: a character string giving the name of the data.

  • reject: the rejection decision for the levels of significance alpha.

  • crit_val: a vector with the critical values for the significance levels alpha used with p_value = "MC" or p_value = "asymp".

If several tests are performed, a type-named list with entries for each test given by the above list.

References

García-Portugués, E. and Verdebout, T. (2018) An overview of uniformity tests on the hypersphere. arXiv:1804.00286. https://arxiv.org/abs/1804.00286.

Examples

## Asymptotic distribution

# Circular data
n <- 10
samp_cir <- r_unif_cir(n = n)

# Matrix
unif_test(data = samp_cir, type = "Ajne", p_value = "asymp")

# Vector
unif_test(data = samp_cir[, 1], type = "Ajne", p_value = "asymp")

# Array
unif_test(data = array(samp_cir, dim = c(n, 1, 1)), type = "Ajne",
          p_value = "asymp")

# Several tests
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "asymp")

# Spherical data
n <- 10
samp_sph <- r_unif_sph(n = n, p = 3)

# Array
unif_test(data = samp_sph, type = "Bingham", p_value = "asymp")

# Matrix
unif_test(data = samp_sph[, , 1], type = "Bingham", p_value = "asymp")

# Several tests
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "asymp")

## Monte Carlo

# Circular data
unif_test(data = samp_cir, type = "Ajne", p_value = "MC")
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "MC")

# Spherical data
unif_test(data = samp_sph, type = "Bingham", p_value = "MC")
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC")

# Caching stats_MC
stats_MC_cir <- unif_stat_MC(n = nrow(samp_cir), p = 2)$stats_MC
stats_MC_sph <- unif_stat_MC(n = nrow(samp_sph), p = 3)$stats_MC
unif_test(data = samp_cir, type = avail_cir_tests,
          p_value = "MC", stats_MC = stats_MC_cir)
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
          stats_MC = stats_MC_sph)

## Critical values

# Circular data
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "crit_val")

# Spherical data
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "crit_val")

# Caching crit_val
crit_val_cir <- unif_stat_MC(n = n, p = 2)$crit_val_MC
crit_val_sph <- unif_stat_MC(n = n, p = 3)$crit_val_MC
unif_test(data = samp_cir, type = avail_cir_tests,
          p_value = "crit_val", crit_val = crit_val_cir)
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "crit_val",
          crit_val = crit_val_sph)

## Specific arguments

# Rothman
unif_test(data = samp_cir, type = "Rothman", Rothman_t = 0.5)

# CCF09
unif_test(data = samp_sph, type = "CCF09", p_value = "MC",
          CCF09_dirs = samp_sph[1:2, , 1])
unif_test(data = samp_sph, type = "CCF09", p_value = "MC",
          CCF09_dirs = samp_sph[3:4, , 1])

## Using a progress bar when p_value = "MC"

# Define a progress bar
require(progress)
require(progressr)
handlers(handler_progress(
  format = ":spin [:bar] :percent Total: :elapsedfull End \u2248 :eta",
  clear = FALSE))

# Call unif_test() within with_progress()
with_progress(
  unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
            chunks = 10, M = 1e3)
)

# With several cores
with_progress(
  unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
            cores = 2, chunks = 10, M = 1e3)
)

# Instead of using with_progress() each time, it is more practical to run
# handlers(global = TRUE)
# once to activate progress bars in your R session


sphunif documentation built on Aug. 21, 2023, 9:11 a.m.