com_item_missingness: Summarize missingness columnwise (in variable)

View source: R/com_item_missingness.R

com_item_missingnessR Documentation

Summarize missingness columnwise (in variable)

Description

Item-Missingness (also referred to as item nonresponse (De Leeuw et al. 2003)) describes the missingness of single values, e.g. blanks or empty data cells in a data set. Item-Missingness occurs for example in case a respondent does not provide information for a certain question, a question is overlooked by accident, a programming failure occurs or a provided answer were missed while entering the data.

Usage

com_item_missingness(
  study_data,
  meta_data,
  resp_vars = NULL,
  label_col,
  show_causes = TRUE,
  cause_label_df,
  include_sysmiss = NULL,
  threshold_value,
  suppressWarnings = FALSE
)

Arguments

study_data

data.frame the data frame that contains the measurements

meta_data

data.frame the data frame that contains metadata attributes of study data

resp_vars

variable list the name of the measurement variables

label_col

variable attribute the name of the column in the metadata with labels of variables

show_causes

logical if TRUE, then the distribution of missing codes is shown

cause_label_df

data.frame missing code table. If missing codes have labels the respective data frame must be specified here

include_sysmiss

logical Optional, if TRUE system missingness (NAs) is evaluated in the summary plot

threshold_value

numeric from=0 to=100. a numerical value ranging from 0-100

suppressWarnings

logical warn about mixed missing and jump code lists

Value

a list with:

  • SummaryTable: data frame about item missingness per response variable

  • SummaryPlot: ggplot2 heatmap plot, if show_causes was TRUE

  • ReportSummaryTable: data frame underlying SummaryPlot

ALGORITHM OF THIS IMPLEMENTATION:

  • Lists of missing codes and, if applicable, jump codes are selected from the metadata

  • The no. of system missings (NA) in each variable is calculated

  • The no. of used missing codes is calculated for each variable

  • The no. of used jump codes is calculated for each variable

  • Two result dataframes (1: on the level of observations, 2: a summary for each variable) are generated

  • OPTIONAL: if show_causes is selected, one summary plot for all resp_vars is provided

See Also

Online Documentation


dataquieR documentation built on Nov. 16, 2022, 5:10 p.m.