gof_gerbil: Goodness-of-fit testing for 'gerbil' objects
In gerbil: Generalized Efficient Regression-Based Imputation with Latent Processes

gof_gerbil

R Documentation

Goodness-of-fit testing for `gerbil` objects

Description

Using a gerbil object as an input, this function performs univariate and bivariate goodness-of-fit tests to compare distributions of imputed and observed values.

Usage

gof_gerbil(
  x,
  y = NULL,
  type = 1,
  imp = 1,
  breaks = NULL,
  method = c("chi-squared", "fisher", "G"),
  ks = FALSE,
  partial = "imputed",
  ...
)

Arguments

`x`	A `gerbil` object containing the imputed data.
`y`	A vector listing the column names of the imputed data for which tests should be run. See details. By default, `y` contains all columns of the data that required imputation.
`type`	A scalar used to specify the type of tests that will be performed. Options include univariate (marginal) tests (`type = 1`) and bivariate tests (`type = 2`). See details. Defaults to `type = 1`.
`imp`	A scalar or vector indicating which of the multiply imputed datasets should be used for testing. Defaults to `imp = 1`.
`breaks`	Used to determine the cut-points for binning of continuous variables into categories. Ideally, `breaks` is a named list, where the list names are the names of the continuous variables. Each element of the list can be a vector giving the respective cutpoints or a scalar which is used to indicate the number of bins (in which case cutpoints are determined from percentiles in order to yield bins of approximately equal size). If `breaks` is a scalar or a vector (and not a list), the binning strategy indicated by `breaks` is applied to each variable in accordance with the description above. Defaults to `breaks = 4`.
`method`	The type of test that is used to compare contingency tables. Options include `'chi-squared'` for chi-squared testing (the default), `'fisher'` for Fisher's exact test, and `'G'` for a G-test.
`ks`	If `TRUE`, a Kolmogorov-Smirnov test is used when for univariate comparisons with continuous variables. This functionality is not enabled for bivariate testing. Defaults to `FALSE`.
`partial`	Indicates how partially imputed pairs are handled in bivariate testing. If `'imputed'`, cases with at least one missing variable in a pair are considered imputed. Otherwise (`partial = 'observed'`), only cases with both variables in the pair missing are considered imputed.
`...`	Arguments to be passed to methods.

Details

Goodness of fit is determined using contingency tables of counts across categories of the corresponding variable(s). For univariate testing (type = 1), a one-way table is calculated for observed cases and compared to an analogous table for imputed cases, whereas for bivariate testing (type = 2), two-way tables are calculated. Continuous variables are binned according to cut-points defined using the parameter breaks. Tests are performed using one of three methods (determined from the parameter method): 1) Chi-squared (the default); 2) Fisher's exact; and 3) A G-test. G-testing is implemented via the function GTest() from the DescTools package. Note that for univariate testing of continuous variables, a Kolmogorov-Smirnov test may be performed instead by setting ks = TRUE.

The only required input is a parameter x which is a gerbil object.

Note that univariate differences between observed and imputed data may be explained by the missingness mechanism and are not necessarily indicative of poor imputations. Note also that most imputation methods like gerbil (and mice and related methods) are not designed to capture complete bivariate distributions. As such, the bivariate tests may be likely to return small p-values.

Value

gof_gerbil() returns an object of the class gof_gerbil that has following slots:

Stats: A vector (when type = 1) or matrix (when type = 2) giving the value of the test statistic (or coefficient) for the corresponding variable (or variable pair).
p.values: A vector (when type = 1) or matrix (when type = 2) giving the value of the p-value for the test applied to the corresponding variable (or variable pair).
Test: A vector (when type = 1) or matrix (when type = 2) indicating the type of test applied to the corresponding variable (or variable pair).
Breaks: A list giving the cutpoints used for binning each continuous or semi-continuous variable.

Examples


#Load the India Human Development Survey-II dataset
data(ihd_mcar) 

imps.gerbil <- gerbil(ihd_mcar, m = 1, mcmciter = 200, ords = "education_level", 
       semi = "farm_labour_days", bincat = c("sex", "marital_status", "job_field", "own_livestock"))

#Run univariate tests
tests.gerbil.uni <- gof_gerbil(imps.gerbil, imp = 1, type = 1)

#Print a summary
tests.gerbil.uni

#Run bivariate tests
tests.gerbil.bi <- gof_gerbil(imps.gerbil, imp = 1, type = 2)

#Print a summary
tests.gerbil.bi

gerbil documentation built on Jan. 12, 2023, 5:10 p.m.