fast.upsilon.test: Fast Upsilon Test of Association between Two Categorical...

View source: R/fast.upsilon.test.R

fast.upsilon.testR Documentation

Fast Upsilon Test of Association between Two Categorical Variables

Description

Performs a fast Upsilon test \insertCiteluo2021upsilonUpsilon to evaluate association between observations from two categorical variables.

Usage

fast.upsilon.test(x, y, log.p = FALSE)

Arguments

x

a vector to specify observations of the first categorical variable. The vector can be of numeric, character, or logical type. NA values must be removed or replaced before calling the function.

y

a vector to specify observations of the second categorical variable. Must not contain NA values and must be of the same length as x.

log.p

a logical. If TRUE, the p-value is calculated in closed form to natural logarithm of p-value to improve numerical precision when p-value approaches zero. Defaults to FALSE.

Details

The Upsilon test is designed to promote dominant function patterns. In contrast to other tests of association to favor all function patterns, it is unique in demoting non-dominant function patterns.

Null hypothesis (H_0): Row and column variables are statistically independent.

Null population: A discrete uniform distribution, where each entry in the table has the same probability.

Null distribution: The Upsilon test statistic asymptotically follows a chi-squared distribution with (nrow(x) - 1)(ncol(x) - 1) degrees of freedom, under the null hypothesis on the null population.

See \insertCiteluo2021upsilonUpsilon for full details of the Upsilon test.

Value

A list with class "htest" containing the following components:

statistic

the Upsilon test statistic.

parameter

the degrees of freedom.

p.value

the p-value of the test.

estimate

the effect size derived from the Upsilon statistic.

method

a character string indicating the method used.

data.name

a character string giving the name of input data.

Note

The test uses an internal hash table, instead of matrix, to store the contingency table. Savings in both runtime and memory saving can be substantial if the contingency table is sparse and large. The test is implemented in C++, to give an additional layer of speedup over an R implementation.

References

\insertRef

luo2021upsilonUpsilon

Examples

library("Upsilon")

weather <- c(
  "rainy", "sunny", "rainy", "sunny", "rainy"
)
mood <- c(
  "wistful", "upbeat", "upbeat", "upbeat", "wistful"
)

fast.upsilon.test(weather, mood)

# The result is equivalent to: 
upsilon.test(table(weather, mood))

Upsilon documentation built on March 7, 2026, 5:07 p.m.