chisq.test2: Chi-squared goodness-of-fit with intrinsic null hypothesis
In chgigot/cgmisc: CGigot Miscellaneous

Description Usage Arguments Details Examples

View source: R/chisq-test2.R

This version of the chi-squared test for goodness-of-fit allows one to perform intrinsic null hypothesis specifying the degree of freedom of the X-squared distribution.

1	chisq.test2(x, p, n_est, df, rescale.p = FALSE, ...)

`x`	A numeric vector. Observed values.
`p`	A vector of probabilities of the same length of x.
`n_est`	Number of estimated parameters. Not yet implemented.
`df`	Degree of freedom.
`...`	Extra parameters to be passed to `chisq.test`.

Under the usual extrinsic null hypothesis where the expected numbers are known before collecting data, and the degree of freedom corresponds to the number of classes minus 1. Indeed,...

However under a intrinsic null hypothesis, one or more parameters are estimated from the data collected to estimate subsequently the expected numbers. The degree of freedom needs to take that into account, and so it is equal to the number of classes minus the number of assessed parameters minus 1.

If df is not given, the regular chisq.test is called with all the same parameters and rescale.p = TRUE.

set.seed(12345)
# Here we know the expected mean number:
p <- 0.3
n <- 10
N <- 30
obs <- rbinom(n = N, size = n, prob = p)
freq <- as.data.frame(table(obs))
names(freq) <- c("category","observed")
freq <- merge(x = data.frame(category = 0:n),
              y = freq,
              by = "category", all = TRUE)
freq[is.na(freq)] <- 0
freq[] <- lapply(freq, as.numeric) # Force each column to be numeric (and only numeric!)
freq$expected <- dbinom(x = 0:10, size = n, prob = p) * N
# Test
test1 <- chisq.test2(freq$observed, p = freq$expected, rescale.p = T)
test2 <- chisq.test(freq$observed, p = freq$expected, rescale.p = T)
identical(test1, test2)
test1

# If we assess one parameter from the observed data set:
p_est <- mean(freq$observed) / n
n_est <- 1
test1 <- chisq.test2(freq$observed, p = freq$expected, n_est = n_est,
                     rescale.p = T)
test2 <- chisq.test2(freq$observed, p = freq$expected,
                     df = length(freq$observed) - n_est - 1, rescale.p = T)
identical(test1, test2)
test1