div_gof: Divergence Tests of Goodness of Fit

View source: R/div_gof.R

div_gofR Documentation

Divergence Tests of Goodness of Fit

Description

Performs divergence-based goodness-of-fit tests for discrete data, including tests of uniformity, pairwise independence, conditional independence, and nested model comparisons.

Usage

div_gof(
  dat,
  var_uniform = NULL,
  var1 = NULL,
  var2 = NULL,
  var_cond = NULL,
  model_full = NULL,
  model_reduced = NULL,
  alpha = 0.05,
  dec = 3,
  use_approx_cv = TRUE
)

Arguments

dat

dataframe with rows as observations and columns as variables. Variables must be categorical with finite range spaces.

var_uniform

character name of a variable in dat to test for uniformity.

var1

character name of the first variable.

var2

character name of the second variable.

var_cond

optional character vector of conditioning variables.

model_full

list containing D and df for the full model.

model_reduced

list containing D and df for the reduced model.

alpha

significance level. Default is 0.05.

dec

number of decimals for rounding. Default is 3.

use_approx_cv

logical; if TRUE, uses the approximate critical value df + sqrt(8 * df). If FALSE, uses the chi-square quantile.

Details

The function implements four types of tests:

1. Uniformity

D = \log r_X - H(X)

2. Pairwise Independence

D = H(X) + H(Y) - H(X,Y)

3. Conditional Independence

D = H(X,Z) + H(Y,Z) - H(Z) - H(X,Y,Z)

where Z may also represent a vector of conditioning variables.

4. Nested Model Comparison

D = D_{reduced} - D_{full}

The test statistic is

2nD\log(2),

since entropies are computed using base 2 logarithms.

Smaller divergence values indicate better model fit.

Value

Dataframe with test type, divergence D, chi-square statistic, degrees of freedom, critical value, and decision.

Author(s)

Termeh Shafie

References

Frank, O., & Shafie, T. (2016). Multivariate entropy analysis of network data. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 129(1), 45-63.

See Also

joint_entropy, entropy_trivar

Examples

data(lawdata)
df_att <- lawdata[[4]]

att_var <- data.frame(
  status    = df_att$status - 1,
  gender    = df_att$gender,
  office    = df_att$office - 1,
  years     = ifelse(df_att$years <= 3, 0,
                ifelse(df_att$years <= 13, 1, 2)),
  age       = ifelse(df_att$age <= 35, 0,
                ifelse(df_att$age <= 45, 1, 2)),
  practice  = df_att$practice,
  lawschool = df_att$lawschool - 1
)

## 1. Test uniformity
div_gof(att_var, var_uniform = "gender")

## 2. Test pairwise independence
div_gof(att_var, var1 = "status", var2 = "gender")

## 3. Test conditional independence

## (a) Conditional independence given a single variable
div_gof(att_var,
        var1 = "status",
        var2 = "gender",
        var_cond = "years")

## (b) Conditional independence given multiple variables
div_gof(att_var,
        var1 = "status",
        var2 = "gender",
        var_cond = c("years", "age"))

## 4. Nested model comparison
## Compare reduced models against the saturated empirical model.
## The saturated model has divergence D = 0 and df = 0.
m_full <- list(D = 0, df = 0)

## (a) Pairwise independence model
m_reduced <- div_gof(att_var,
                    var1 = "status",
                    var2 = "gender")

div_gof(att_var,
        model_full = m_full,
        model_reduced = list(D = m_reduced$D, df = m_reduced$df))

## (b) Conditional independence model
m_reduced <- div_gof(att_var,
                    var1 = "status",
                    var2 = "gender",
                    var_cond = "years")

div_gof(att_var,
        model_full = m_full,
        model_reduced = list(D = m_reduced$D, df = m_reduced$df))

## 5. Nested comparison against the saturated empirical model
m_full <- list(D = 0, df = 0)

m_reduced <- div_gof(att_var,
                    var1 = "status",
                    var2 = "gender")

div_gof(att_var,
        model_full = m_full,
        model_reduced = list(D = m_reduced$D, df = m_reduced$df))

netropy documentation built on April 24, 2026, 9:06 a.m.