compare_category.data.frame: Compare categorical variables
In dlookr: Tools for Data Diagnosis, Exploration, Transformation

compare_category

R Documentation

Compare categorical variables

Description

The compare_category() compute information to examine the relationship between categorical variables.

Usage

compare_category(.data, ...)

## S3 method for class 'data.frame'
compare_category(.data, ...)

Arguments

`.data`	a data.frame or a `tbl_df`.
`...`	one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing.

Details

It is important to understand the relationship between categorical variables in EDA. compare_category() compares relations by pair combination of all categorical variables. and return compare_category class that based list object.

Value

An object of the class as compare based list. The information to examine the relationship between categorical variables is as follows each components.

var1 : factor. The level of the first variable to compare. 'var1' is the name of the first variable to be compared.
var2 : factor. The level of the second variable to compare. 'var2' is the name of the second variable to be compared.
n : integer. frequency by var1 and var2.
rate : double. relative frequency.
first_rate : double. relative frequency in first variable.
second_rate : double. relative frequency in second variable.

Attributes of return object

Attributes of compare_category class is as follows.

variables : character. List of variables selected for comparison.
combination : matrix. It consists of pairs of variables to compare.

Examples


# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "smoking"] <- NA

library(dplyr)

# Compare the all categorical variables
all_var <- compare_category(heartfailure2)

# Print compare_numeric class objects
all_var

# Compare the categorical variables that case of joint the death_event variable
all_var %>% 
  "["(grep("death_event", names(all_var)))

# Compare the two categorical variables
two_var <- compare_category(heartfailure2, smoking, death_event)

# Print compare_category class objects
two_var

# Filtering the case of smoking included NA 
two_var %>%
  "[["(1) %>% 
  filter(!is.na(smoking))

# Summary the all case : Return a invisible copy of an object.
stat <- summary(all_var)

# Summary by returned objects
stat

# component of table 
stat$table

# component of chi-square test 
stat$chisq

# component of chi-square test 
summary(all_var, "chisq")

# component of chi-square test (first, third case)
summary(all_var, "chisq", pos = c(1, 3))

# component of relative frequency table 
summary(all_var, "relative")

# component of table without missing values 
summary(all_var, "table", na.rm = TRUE)

# component of table include marginal value 
margin <- summary(all_var, "table", marginal = TRUE)
margin

# component of chi-square test 
summary(two_var, method = "chisq")

# verbose is FALSE 
summary(all_var, "chisq", verbose = FALSE)

#' # Using pipes & dplyr -------------------------
# If you want to use dplyr, set verbose to FALSE
summary(all_var, "chisq", verbose = FALSE) %>% 
  filter(p.value < 0.26)

# Extract component from list by index
summary(all_var, "table", na.rm = TRUE, verbose = FALSE) %>% 
  "[["(1)

# Extract component from list by name
summary(all_var, "table", na.rm = TRUE, verbose = FALSE) %>% 
  "[["("smoking vs death_event")

# plot all pair of variables
plot(all_var)

# plot a pair of variables
plot(two_var)

# plot all pair of variables by prompt
plot(all_var, prompt = TRUE)

# plot a pair of variables
plot(two_var, las = 1)

dlookr documentation built on May 29, 2024, 2 a.m.