multinom.neighborhood.test: Perform the neighborhood test for multinom.test

View source: R/multinomial.R

multinom.neighborhood.testR Documentation

Perform the neighborhood test for multinom.test

Description

Peforms the two sample test for two multinomial vectors testing H_0: the underlying multinomial probability vectors are within some neighborhood of one another vs. H_1: they are not.

Usage

multinom.neighborhood.test(x, y = NULL, delta = NULL)

Arguments

x, y

Integer vectors (or matrices or dataframes containing multiple integer vector observations as rows). x and y must be the same type and dimension. If x and y are matrices (or dataframes), the i^th row of x will be tested against the i^th row of y for all i in 1..nrow(x). Alternatively, x can be a list of two vectors, matrices, or dataframes to be compared. In this case, y is NULL by default.

delta

A number (or vector) greater than 0.

Details

In testing the equality of parameters from two populations (as in multinom.test), it frequenly happens that the null hypothesis is rejected even though the estimates of effect sizes are close to each other. However, these differences may be so small that the parameters are not considered different in practice. A neighborhood test is useful in this situation.

Value

The statistic from multinom.test and its associated p_delta, where p_delta = 1 - pnorm(T - delta). If x and y are two dimensional (that is, they are matrices or dataframes with more than one row) and/or delta is a vector, then a matrix will be returned where the (i,j)^{th} entry will be the p.delta associated with the i^{th} rows of x and y and the j^{th} entry of the delta vector.

See Also

multinom.test, vignette("multinomial-neighborhood-test-vignette")

Amanda Plunkett & Junyong Park (2018), Two-Sample Test for Sparse High Dimensional Multinomial Distributions, TEST, https://doi.org/10.1007/s11749-018-0600-8

Examples


# Load the twoNewsGroups dataset

data(twoNewsGroups)

# Sample two sets of 200 documents from the sci.med newsGroup (to simulate
# the null hypothesis being TRUE). For each of the two groups, sum the
# 200 term frequency vectors together. They will be the two vectors that
# we test.

num_docs <- 200
vecs2test <- list(NA, 2)
row_ids <- 1:nrow(twoNewsGroups$sci.med)
group_1 <- sample(row_ids, num_docs)
group_2 <- sample(row_ids[-group_1], num_docs)

vecs2test[[1]] <- twoNewsGroups$sci.med[group_1,] |>
                    colSums() |>
                    matrix(nrow=1)
vecs2test[[2]] <- twoNewsGroups$sci.med[group_2,] |>
                    colSums() |>
                    matrix(nrow=1)

# Test the null that the two vectors come from the same distribution
# (i.e. the same news group)

vecs2test |> multinom.test()

# The above test likely produced a significant p-value meaning that we would
# reject the null. However, the difference isn't very interesting. Instead,
# test that the differences are within some neighborhood:

vecs2test |> multinom.neighborhood.test(delta=60)



AmandaRP/hddtest documentation built on March 18, 2023, 5:53 p.m.