GET.distrindep: Test of independence of two general distributions

View source: R/appl_indeptest.r

GET.distrindepR Documentation

Test of independence of two general distributions

Description

Permutation-based test of independence in a bivariate vector using as the test statistic either 1) the empirical joint cumulative distribution function, 2) the matrix of observed counts of a 2D contingency table, or 3) the smoothed Q-Q plot.

Usage

GET.distrindep(
  X,
  nsim = 999,
  statistic = c("cdf", "contingency", "qq"),
  ngrid,
  seq.x,
  seq.y,
  sigma,
  atoms.x,
  atoms.y,
  ...
)

Arguments

X

A matrix with n rows and 2 columns. Each row contains one bivariate observation.

nsim

The number of random permutations used.

statistic

Either "cdf", "contingency" or "qq" corresponding to the three test functions.

ngrid

Vector with two elements, giving the number of grid points to be used in the test statistic for each of the two marginals. The default is 20 in each marginal for "cdf" and 64 for "qq". (This is not relevant for "contingency".)

seq.x

For the first marginal, the values at which the empirical cumulative distribution function will be evaluated. If NULL (the default), sequence of quantiles will be used, equidistant in terms of probability. seq.x and seq.y only relevant for "cdf".

seq.y

For the second marginal, the values at which the empirical cumulative distribution function will be evaluated. If NULL (the default), sequence of quantiles will be used, equidistant in terms of probability. seq.x and seq.y only relevant for "cdf".

sigma

Standard deviation of the smoothing kernel to be used for smoothing the Q-Q plot when computing the test statistic. If NULL, sensible default value is used based on the number of observations.

atoms.x

Vector specifying atomic values in the first marginal. Only relevant for "qq". See Examples.

atoms.y

Vector specifying atomic values in the second marginal. Only relevant for "qq". See Examples.

...

Additional parameters to be passed to global_envelope_test. In particularly, alpha specifies the nominal significance level of the test, and type the type of the global envelope test.

Details

The function performs permutation-based test of independence in a bivariate sample based on three different test statistics chosen by the argument statistic.

If the observed data are the pairs \{(X_1, Y_1), \ldots, (X_n, Y_n)\}, the permutations are obtained by randomly permuting the values in the second marginal, i.e. \{(X_1, Y_{\pi(1)}), \ldots, (X_n, Y_{\pi(n)})\}.

The first alternative statistic = "cdf" is the empirical joint cumulative distribution function computed on a grid of ngrid[1] times ngrid[2] arguments. The grid points are chosen according to the quantiles of the marginal distributions. The second alternative statistic = "contingency" is to test of independence in a 2D contingency table, using the matrix of observed counts as the test statistic. The third alternative statistic = "qq" is based on Q-Q representation and estimate of the intensity function computed on a regular grid of ngrid[1] times ngrid[2] points.

The test itself is in each case performed using the global envelope test of the chosen version, see the argument type of global_envelope_test.

In the case of a 2D contingency table, instead of plotting, text output can be printed in the console by typing the object name. The cells in which the observed value exceeds the upper envelope printed in red, and cells in which the observed value is lower than the lower envelope printed in cyan. Standard output of the global envelope test is also returned and can be plotted or analyzed accordingly.

References

Dvořák, J. and Mrkvička, T. (2022). Graphical tests of independence for general distributions. Computational Statistics 37, 671–699.

Examples

#- Example of cdf
#----------------
# Generate sample data
data <- matrix(rnorm(n=200), ncol=2) %*% matrix(c(1,1,0,1), ncol=2)
plot(data)

# Compute the CDF test and plot the significant regions
res <- GET.distrindep(data, statistic="cdf", ngrid=c(20,15), nsim=1999)

plot(res) + ggplot2::scale_radius(range = 2 * c(1, 6))

# Extract the p-value
attr(res,"p")

#- Example of a 2D contingency table
#-----------------------------------
# Generate sample data:
data <- matrix(c(sample(4, size=100, replace=TRUE), sample(2, size=100, replace=TRUE)), ncol=2)
data[,2] <- data[,2] + data[,1]

# Observed contingency table (with row names and column names)
table(data[,1], data[,2])

# Permutation-based envelope test
res <- GET.distrindep(data, statistic="contingency", nsim=999)

res
plot(res) + ggplot2::scale_radius(range = 5 * c(1, 6))

# Extract the p-value
attr(res,"p")

# Example of QQ
#--------------
# Generate sample data
data <- matrix(rnorm(n=200), ncol=2) %*% matrix(c(1,1,0,1), ncol=2)

plot(data)

# Compute the QQ test and plot the significant regions
res <- GET.distrindep(data, statistic="qq", ngrid=c(30,20), nsim=999)

plot(res)
# Extract the p-value
attr(res,"p")

# With atoms, independent
data <- cbind(rnorm(n=100), sample(4, size=100, replace=TRUE))
plot(data)
res <- GET.distrindep(data, statistic="qq", nsim=999, atoms.y=c(1,2,3,4))

plot(res)


# With atoms, dependent
data <- cbind(sort(rnorm(n=100)), sort(sample(4, size=100, replace=TRUE)))
plot(data)
res <- GET.distrindep(data, statistic="qq", nsim=999, atoms.y=c(1,2,3,4))
plot(res, sign.type="col", what=c("obs", "lo", "hi", "lo.sign", "hi.sign"))


# Atoms in both variables
data <- cbind(rnorm(n=100), rnorm(n=100)) %*% matrix(c(1,1,0,1), ncol=2)
data[,1][data[,1]<=-1] <- -1
data[,2][data[,2]<=-0.5] <- -0.5
plot(data)

# Perform the test
res <- GET.distrindep(data, statistic="qq", nsim=999, atoms.x=c(-1), atoms.y=c(-0.5))

plot(res, sign.type="col", what=c("obs", "lo", "hi", "lo.sign", "hi.sign"))

GET documentation built on Sept. 29, 2023, 5:06 p.m.