dif_prep: Data prep for 'DIFreport' functions.
In knickodem/WBdif: Investigate DIF and generate reports

dif_prep

R Documentation

Data prep for `DIFreport` functions.

Description

Data pre-processing to gather information and prepare data for use in other DIFreport functions.

Usage

dif_prep(
  data,
  dif.groups,
  ref.name = NULL,
  items = names(data),
  anchors = NULL,
  poly.items = NULL,
  max.values = NULL,
  cond.groups = NULL,
  cluster = NULL,
  na.to.0 = FALSE
)

Arguments

`data`	a `data.frame` containing the item responses with subjects in rows and items in columns. Items can be dichotomous or polytomous, but the columns must be numeric. All reverse coding should be completed before using this function.
`dif.groups`	column position or name in `data` indicating the two (and only two) groups to compare for DIF.
`ref.name`	character; specifying the reference group in `dif.groups`. This primarily for organizing the report output. If examining standardized mean differences, the standard deviation of this group can be used rather than the pooled standard deviation by setting `pooled = FALSE` in `dif_report` or `summary_report`
`items`	vector of column positions or names in `data` indicating the items to analyze for DIF. Default is to use all columns.
`anchors`	vector of column positions or names `data` indicating the anchor items. If specified, must be a subset of `items`.
`poly.items`	vector of column positions or names `data` indicating polytomous items. If `NULL` (default), polytomous items will be auto-detected as those with > 2 responses options.
`max.values`	the maximum value each item in `poly.items` can take, declared by an integer vector of the same length as `poly.items` or a single value to be used for all `poly.items`. If `NULL` (default), values will be auto-detected. This value is used for unit-scoring polytomous items for loess, MH, and logistic methods.
`cond.groups`	column name or number in `data` indicating the two (and only two) groups on which to condition the mean difference between `dif.groups`.
`cluster`	column name or number in `data` indicating the primary sampling unit in a multi-stage or clustered sampling design – used to adjust effect sizes and their standard errors.
`na.to.0`	After removing empty rows, should remaining NAs in `items` columns be converted to 0? Default is FALSE.

Details

This function saves the input data in a format used by other DIFreport functions and also runs a number of pre-processing steps:

Drops rows in data with NA for all items columns, dif.groups, cond.groups (if specified), or cluster (if specified).
Converts dif.groups and to a factor and confirms there are only two groups. Same for cond.groups if specified.
Flag items with no variance.
Flag items with different number of response categories across the dif.groups.
Identify items with more than two response categories (i.e., polytomous items).

It is recommended, but not required, that responses for each item are coded as consecutive integers with a minimum value of 0 (e.g., 0, 1, 2).

Value

A named list containing the pre-processed inputs and item flags.

Examples

data("mdat")

dif.data <- dif_prep(data = mdat,
                             dif.groups = "treated",
                             ref.name = "Control",
                             items = 5:ncol(mdat)
                             cond.groups ="gender",
                             cluster = "clusterid",
                             na.to.0 = TRUE)

knickodem/WBdif documentation built on Feb. 3, 2024, 2:20 a.m.