fast.gtest: Fast Zero-Tolerant G-Test of Association

View source: R/fast.gtest.R

fast.gtestR Documentation

Fast Zero-Tolerant G-Test of Association

Description

Performs a fast zero-tolerant G-test \insertCiteWOOLF:1957aaUpsilon to evaluate association between observations from two categorical variables.

Usage

fast.gtest(x, y, log.p = FALSE)

Arguments

x

a vector to specify observations of the first categorical variable. The vector can be of numeric, character, or logical type. NA values must be removed or replaced before calling the function.

y

a vector to specify observations of the second categorical variable. Must not contain NA values and must be of the same length as x.

log.p

a logical. If TRUE, the p-value is calculated in closed form to natural logarithm of p-value to improve numerical precision when p-value approaches zero. Defaults to FALSE.

Value

A list with class "htest" containing the following components:

statistic

the G-test statistic (Likelihood Ratio Chi-squared statistic).

parameter

the degrees of freedom.

p.value

the p-value of the test.

estimate

the mutual information between the two variables.

method

a character string indicating the method used.

data.name

a character string giving the names of the data.

Note

The test uses an internal hash table, instead of matrix, to store the contingency table. Savings in both runtime and memory saving can be substantial if the contingency table is sparse and large. The test is implemented in C++, to give an additional layer of speedup over an R implementation.

References

\insertRef

WOOLF:1957aaUpsilon

Examples

library("Upsilon")
weather <- c(
  "rainy", "sunny", "rainy", "sunny", "rainy"
)
mood <- c(
  "wistful", "upbeat", "upbeat", "upbeat", "wistful"
)

fast.gtest(weather, mood)

# The result is equivalent to: 
modified.gtest(table(weather, mood))

Upsilon documentation built on March 7, 2026, 5:07 p.m.