com_item_missingness: Summarize missingness columnwise (in variable)

View source: R/com_item_missingness.R

com_item_missingnessR Documentation

Summarize missingness columnwise (in variable)

Description

Item-Missingness (also referred to as item nonresponse (De Leeuw et al. 2003)) describes the missingness of single values, e.g. blanks or empty data cells in a data set. Item-Missingness occurs for example in case a respondent does not provide information for a certain question, a question is overlooked by accident, a programming failure occurs or a provided answer were missed while entering the data.

Usage

com_item_missingness(
  study_data,
  meta_data,
  resp_vars = NULL,
  label_col,
  show_causes = TRUE,
  cause_label_df,
  include_sysmiss = TRUE,
  threshold_value,
  suppressWarnings = FALSE,
  assume_consistent_codes = TRUE,
  expand_codes = assume_consistent_codes,
  drop_levels = TRUE,
  expected_observations = c("HIERARCHY", "ALL", "SEGMENT"),
  pretty_print = TRUE
)

Arguments

study_data

data.frame the data frame that contains the measurements

meta_data

data.frame the data frame that contains metadata attributes of study data

resp_vars

variable list the name of the measurement variables

label_col

variable attribute the name of the column in the metadata with labels of variables

show_causes

logical if TRUE, then the distribution of missing codes is shown

cause_label_df

data.frame missing code table. If missing codes have labels the respective data frame can be specified here or in the metadata as assignments, see cause_label_df

include_sysmiss

logical Optional, if TRUE system missingness (NAs) is evaluated in the summary plot

threshold_value

numeric from=0 to=100. a numerical value ranging from 0-100

suppressWarnings

logical warn about consistency issues with missing and jump lists

assume_consistent_codes

logical if TRUE and no labels are given and the same missing/jump code is used for more than one variable, the labels assigned for this code are treated as being be the same for all variables.

expand_codes

logical if TRUE, code labels are copied from other variables, if the code is the same and the label is set somewhere

drop_levels

logical if TRUE, do not display unused missing codes in the figure legend.

expected_observations

enum HIERARCHY | ALL | SEGMENT. If ALL, all observations are expected to comprise all study segments. If SEGMENT, the PART_VAR is expected to point to a variable with values of 0 and 1, indicating whether the variable was expected to be observed for each data row. If HIERARCHY, this is also checked recursively, so, if a variable points to such a participation variable, and that other variable does has also a PART_VAR entry pointing to a variable, the observation of the initial variable is only expected, if both segment variables are 1.

pretty_print

logical If FALSE, produce a table that can easily be processed further, because some cells feature two numbers (absolute and percentage) otherwise.

Value

a list with:

  • SummaryTable: data frame about item missingness per response variable

  • SummaryPlot: ggplot2 heatmap plot, if show_causes was TRUE

  • ReportSummaryTable: data frame underlying SummaryPlot

ALGORITHM OF THIS IMPLEMENTATION:

  • Lists of missing codes and, if applicable, jump codes are selected from the metadata

  • The no. of system missings (NA) in each variable is calculated

  • The no. of used missing codes is calculated for each variable

  • The no. of used jump codes is calculated for each variable

  • Two result dataframes (1: on the level of observations, 2: a summary for each variable) are generated

  • OPTIONAL: if show_causes is selected, one summary plot for all resp_vars is provided

See Also

Online Documentation


dataquieR documentation built on July 26, 2023, 6:10 p.m.