diagnose_sparese.data.frame: Diagnosis of level combinations of categorical variables
In bit2r/kodlookr: Korean Help Resources for the dlookr Package

Description Usage Arguments Value Information of sparse levels Examples

The diagnose_sparese() checks for combinations of levels that do not appear as data among all combinations of levels of categorical variables.

diagnose_sparese(.data, ...)

## S3 method for class 'data.frame'
diagnose_sparese(
  .data,
  ...,
  type = c("all", "sparse")[2],
  add_character = FALSE,
  limit = 500
)

`.data`	a data.frame or a `tbl_df`.
`...`	one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, diagnose_sparese() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing.
`type`	a character string specifying how result are extracted. "all" that returns a combination of all possible levels. At this time, the frequency of each case is also returned.. Default is "sparse" returns only sparse level combinations.
`add_character`	logical. Decide whether to include text variables in the diagnosis of categorical data. The default value is TRUE, which also includes character variables.
`limit`	integer. Conditions to check sparse levels. If the number of all possible combinations exceeds the limit, the calculation ends.

an object of data.frame.

The information derived from the sparse levels diagnosis is as follows.

variables : level of categorical variables.
N : number of observation. (optional)

library(dplyr)

# Examples of too many combinations
diagnose_sparese(jobchange)

# Character type is also included in the combination variable
diagnose_sparese(jobchange, add_character = TRUE)

# Combination of two variables
jobchange %>% 
  diagnose_sparese(education_level, major_discipline)

# Remove two categorical variables from combination
jobchange %>% 
  diagnose_sparese(-city, -education_level)

diagnose_sparese(heartfailure)

# Adjust the threshold of limt to calculate
diagnose_sparese(heartfailure, limit = 50)

# List all combinations, including parese cases
diagnose_sparese(heartfailure, type = "all") 

# collaboration with dplyr
heartfailure %>% 
  diagnose_sparese(type = "all") %>% 
  arrange(desc(n_case)) %>% 
  mutate(percent = round(n_case / sum(n_case) * 100, 1))