e_split_list_columns_into_indicator_columns: Split a (set of) item-listing columns into indicator columns

View source: R/e_split_list_columns_into_indicator_columns.R

e_split_list_columns_into_indicator_columnsR Documentation

Split a (set of) item-listing columns into indicator columns

Description

Commonly used for comorbidities or prescription lists within a single (or multiple) column(s). Takes a column where items are separated by puncutation (,.;/|) and creates separate columns with indicators. Can treat counts of items >1 as 1 to simplify summary tables (for example, multiple items coded as "other").

Usage

e_split_list_columns_into_indicator_columns(
  dat_this,
  var_names_items = NULL,
  item_delimiters = ",.;/|",
  code_other_below_freq = 5,
  label_other = "other",
  indicator_col_prefix = "item_",
  sw_data_or_summary = c("data", "summary")[1],
  sw_replace_GT1_with_1 = FALSE,
  sw_print_unique = TRUE
)

Arguments

dat_this

entire data.frame or tibble, will return with additional indicator columns

var_names_items

column names with lists of items

item_delimiters

delimiter(s) that separate items within a single column

code_other_below_freq

replace item name with label_other if total frequency for an item is less than this value

label_other

label for the "other" category

indicator_col_prefix

prefix for the new indicator columns

sw_data_or_summary

return data with indicator columns or return summary tables of frequencies of items

sw_replace_GT1_with_1

T/F, to replace "greater than 1" counts with an indicator of 1 (to interpret as "at least 1")

sw_print_unique

T/F, print list of items before and after replacing with "other"

Value

dat_this from sw_data_or_summary, either the data with additional indicator columns; or a list of summary tables of frequencies of items

Examples

dat_ex <-
  dplyr::tibble(
    col1 =
      c(
        NA, "", "a", "A, B  ,C", "b", "D. c ;    d"
      , "x  ;Y", "ab/0|J;1;1", 2, "other", 1
      , "a a a", "1,a a a", "2,a a a"
      )
  , col2 = LETTERS[1:length(col1)]
  ) |>
  dplyr::mutate(
    ID = 1:dplyr::n()
  ) |>
  dplyr::select(
    ID
  , tidyselect::everything()
  )
dat_ex |> print(n = Inf)

# return data
dat_ex_out <-
  e_split_list_columns_into_indicator_columns(
    dat_this              = dat_ex
  , var_names_items       = c("col1", "col2")
  , item_delimiters       = ",.;/|"
  , code_other_below_freq = 2
  , label_other           = "other"
  , indicator_col_prefix  = "item_"
  , sw_data_or_summary    = "data"
  , sw_replace_GT1_with_1 = FALSE
  , sw_print_unique       = TRUE
  )
dat_ex_out |> print(n = Inf)

# return summary
dat_ex_sum <-
  e_split_list_columns_into_indicator_columns(
    dat_this              = dat_ex
  , var_names_items       = c("col1", "col2")
  , item_delimiters       = ",.;/|"
  , code_other_below_freq = 2
  , label_other           = "other"
  , indicator_col_prefix  = "item_"
  , sw_data_or_summary    = "summary"
  , sw_replace_GT1_with_1 = FALSE
  , sw_print_unique       = FALSE
  )
dat_ex_sum

erikerhardt/erikmisc documentation built on April 17, 2025, 10:48 a.m.