R/combination.R

#' @rdname diagnose_sparese.data.frame
#' @name diagnose_sparese.data.frame
#' @usage diagnose_sparese(.data, ...)
NULL

#' Diagnosis of level combinations of categorical variables
#'
#' @description The diagnose_sparese() checks for combinations of levels that 
#' do not appear as data among all combinations of levels of categorical variables. 
#'
#' @section Information of sparse levels:
#' The information derived from the sparse levels diagnosis is as follows.
#'
#' \itemize{
#' \item variables : level of categorical variables.
#' \item N : number of observation. (optional)
#' }
#'
#' @param .data a data.frame or a \code{\link{tbl_df}}.
#' @param ... one or more unquoted expressions separated by commas.
#' You can treat variable names like they are positions.
#' Positive values select variables; negative values to drop variables.
#' If the first expression is negative, diagnose_sparese() will automatically
#' start with all variables.
#' These arguments are automatically quoted and evaluated in a context where
#' column names represent column positions.
#' They support unquoting and splicing.
#'
#' @param type a character string specifying how result are extracted.
#' "all" that returns a combination of all possible levels. At this time, 
#' the frequency of each case is also returned..
#' Default is "sparse" returns only sparse level combinations.
#' @param add_character logical. Decide whether to include text variables in the
#' diagnosis of categorical data. The default value is TRUE, 
#' which also includes character variables.
#' @param limit integer. Conditions to check sparse levels. 
#' If the number of all possible combinations exceeds the limit, the calculation ends.
#' @return an object of data.frame.
#' @examples
#' library(dplyr)
#' 
#' # Examples of too many combinations
#' diagnose_sparese(jobchange)
#' 
#' # Character type is also included in the combination variable
#' diagnose_sparese(jobchange, add_character = TRUE)
#' 
#' # Combination of two variables
#' jobchange %>% 
#'   diagnose_sparese(education_level, major_discipline)
#'
#' # Remove two categorical variables from combination
#' jobchange %>% 
#'   diagnose_sparese(-city, -education_level)
#'
#' diagnose_sparese(heartfailure)
#' 
#' # Adjust the threshold of limt to calculate
#' diagnose_sparese(heartfailure, limit = 50)
#' 
#' # List all combinations, including parese cases
#' diagnose_sparese(heartfailure, type = "all") 
#' 
#' # collaboration with dplyr
#' heartfailure %>% 
#'   diagnose_sparese(type = "all") %>% 
#'   arrange(desc(n_case)) %>% 
#'   mutate(percent = round(n_case / sum(n_case) * 100, 1))
#'
#' @name diagnose_sparese.data.frame
#' @usage 
#' ## S3 method for class 'data.frame'
#' diagnose_sparese(
#'   .data,
#'   ...,
#'   type = c("all", "sparse")[2],
#'   add_character = FALSE,
#'   limit = 500
#' )
NULL
bit2r/kodlookr documentation built on Dec. 19, 2021, 9:49 a.m.