mvbinary.test | R Documentation |
Peforms a two-sample test for two binary vectors testing H_0: the underlying probability vectors are the same vs. H_1: they are different.
mvbinary.test(x, y = NULL, numPerms = 5000)
x, y |
Matrices (or dataframes) containing multiple
integer vector observations as rows. |
numPerms |
Number of permutations to use to calculate the p-value. Default value is 5000. |
The statistic is T = ∑_{j=1}^d D_j^2 I( |Dj| ≥ δ(d)) where d is the dimension of the data. Additionally:
Dj = (\hat{p}_{1j} − \hat{p}_{2j} )/√{ \hat{p}_j (1 − \hat{p}_j )(1/n1 + 1/n2) }
\hat{p}_{cj} is the estimate of p_{cj} for the c^{th} group calculated by the j^th column mean
\hat{p}_j is the pooled estimate for the j^{th} variable.
δ(d) = √{2 log (a_d d)} where a_d = (log d)^{-2}
The p-value associated with the statistic is calculated using the permutation method. The observation vectors are repeatedly shuffled between groups, each time being used to re-calculate the statistic. A null distribution is constructed and used to calcualate the p-value.
A list containing the computed statistic
, a list of statistics
(null.statistics
) used to construct the null distritubution (from the
permutation method), and the associated pvalue
. The pvalue
is
the percent of null.statistics
that are more extreme than the
statistic
computed from the original dataset.
As described in the reference below, this method may not perform well (low power) on highly correlated variables.
Also, note that for large values of numPerms
, run time may be long.
However, larger values of numPerms
produce more accurate estimates
of the p-value.
Amanda Plunkett & Junyong Park (2017), Two-sample Tests for Sparse High-Dimensional Binary Data, Communications in Statistics - Theory and Methods, 46:22, 11181-11193
# Binarize the twoNewsGroups dataset: data(twoNewsGroups) binData <- list(twoNewsGroups[[1]] > 0, twoNewsGroups[[2]] > 0) names(binData) <- names(twoNewsGroups) # Perform the test: result <- mvbinary.test(binData, numPerms = 100) result$pvalue # The following are equivalent to the previous test: result <- mvbinary.test(binData[[1]], binData[[2]], numPerms = 100) result <- binData |> mvbinary.test(numPerms = 100)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.