dif_prep: Data prep for 'DIFreport' functions.

View source: R/DIF-Prep.R

dif_prepR Documentation

Data prep for DIFreport functions.

Description

Data pre-processing to gather information and prepare data for use in other DIFreport functions.

Usage

dif_prep(
  data,
  dif.groups,
  ref.name = NULL,
  items = names(data),
  anchors = NULL,
  poly.items = NULL,
  max.values = NULL,
  cond.groups = NULL,
  cluster = NULL,
  na.to.0 = FALSE
)

Arguments

data

a data.frame containing the item responses with subjects in rows and items in columns. Items can be dichotomous or polytomous, but the columns must be numeric. All reverse coding should be completed before using this function.

dif.groups

column position or name in data indicating the two (and only two) groups to compare for DIF.

ref.name

character; specifying the reference group in dif.groups. This primarily for organizing the report output. If examining standardized mean differences, the standard deviation of this group can be used rather than the pooled standard deviation by setting pooled = FALSE in dif_report or summary_report

items

vector of column positions or names in data indicating the items to analyze for DIF. Default is to use all columns.

anchors

vector of column positions or names data indicating the anchor items. If specified, must be a subset of items.

poly.items

vector of column positions or names data indicating polytomous items. If NULL (default), polytomous items will be auto-detected as those with > 2 responses options.

max.values

the maximum value each item in poly.items can take, declared by an integer vector of the same length as poly.items or a single value to be used for all poly.items. If NULL (default), values will be auto-detected. This value is used for unit-scoring polytomous items for loess, MH, and logistic methods.

cond.groups

column name or number in data indicating the two (and only two) groups on which to condition the mean difference between dif.groups.

cluster

column name or number in data indicating the primary sampling unit in a multi-stage or clustered sampling design – used to adjust effect sizes and their standard errors.

na.to.0

After removing empty rows, should remaining NAs in items columns be converted to 0? Default is FALSE.

Details

This function saves the input data in a format used by other DIFreport functions and also runs a number of pre-processing steps:

  • Drops rows in data with NA for all items columns, dif.groups, cond.groups (if specified), or cluster (if specified).

  • Converts dif.groups and to a factor and confirms there are only two groups. Same for cond.groups if specified.

  • Flag items with no variance.

  • Flag items with different number of response categories across the dif.groups.

  • Identify items with more than two response categories (i.e., polytomous items).

It is recommended, but not required, that responses for each item are coded as consecutive integers with a minimum value of 0 (e.g., 0, 1, 2).

Value

A named list containing the pre-processed inputs and item flags.

Examples

data("mdat")

dif.data <- dif_prep(data = mdat,
                             dif.groups = "treated",
                             ref.name = "Control",
                             items = 5:ncol(mdat)
                             cond.groups ="gender",
                             cluster = "clusterid",
                             na.to.0 = TRUE)


knickodem/WBdif documentation built on Feb. 3, 2024, 2:20 a.m.