diagnose_sparese | R Documentation |
The diagnose_sparese() checks for combinations of levels that do not appear as data among all combinations of levels of categorical variables.
diagnose_sparese(.data, ...)
## S3 method for class 'data.frame'
diagnose_sparese(
.data,
...,
type = c("all", "sparse")[2],
add_character = FALSE,
limit = 500
)
.data |
a data.frame or a |
... |
one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, diagnose_sparese() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing. |
type |
a character string specifying how result are extracted. "all" that returns a combination of all possible levels. At this time, the frequency of each case is also returned.. Default is "sparse" returns only sparse level combinations. |
add_character |
logical. Decide whether to include text variables in the diagnosis of categorical data. The default value is TRUE, which also includes character variables. |
limit |
integer. Conditions to check sparse levels. If the number of all possible combinations exceeds the limit, the calculation ends. |
an object of data.frame.
The information derived from the sparse levels diagnosis is as follows.
variables : level of categorical variables.
N : number of observation. (optional)
library(dplyr)
# Examples of too many combinations
diagnose_sparese(jobchange)
# Character type is also included in the combination variable
diagnose_sparese(jobchange, add_character = TRUE)
# Combination of two variables
jobchange %>%
diagnose_sparese(education_level, major_discipline)
# Remove two categorical variables from combination
jobchange %>%
diagnose_sparese(-city, -education_level)
diagnose_sparese(heartfailure)
# Adjust the threshold of limt to calculate
diagnose_sparese(heartfailure, limit = 50)
# List all combinations, including parese cases
diagnose_sparese(heartfailure, type = "all")
# collaboration with dplyr
heartfailure %>%
diagnose_sparese(type = "all") %>%
arrange(desc(n_case)) %>%
mutate(percent = round(n_case / sum(n_case) * 100, 1))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.