compare_category.data.frame: Compare categorical variables

View source: R/compare.R

compare_categoryR Documentation

Compare categorical variables

Description

The compare_category() compute information to examine the relationship between categorical variables.

Usage

compare_category(.data, ...)

## S3 method for class 'data.frame'
compare_category(.data, ...)

Arguments

.data

a data.frame or a tbl_df.

...

one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing.

Details

It is important to understand the relationship between categorical variables in EDA. compare_category() compares relations by pair combination of all categorical variables. and return compare_category class that based list object.

Value

An object of the class as compare based list. The information to examine the relationship between categorical variables is as follows each components.

  • var1 : factor. The level of the first variable to compare. 'var1' is the name of the first variable to be compared.

  • var2 : factor. The level of the second variable to compare. 'var2' is the name of the second variable to be compared.

  • n : integer. frequency by var1 and var2.

  • rate : double. relative frequency.

  • first_rate : double. relative frequency in first variable.

  • second_rate : double. relative frequency in second variable.

Attributes of return object

Attributes of compare_category class is as follows.

  • variables : character. List of variables selected for comparison.

  • combination : matrix. It consists of pairs of variables to compare.

See Also

summary.compare_category, print.compare_category, plot.compare_category.

Examples


# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "smoking"] <- NA

library(dplyr)

# Compare the all categorical variables
all_var <- compare_category(heartfailure2)

# Print compare_numeric class objects
all_var

# Compare the categorical variables that case of joint the death_event variable
all_var %>% 
  "["(grep("death_event", names(all_var)))

# Compare the two categorical variables
two_var <- compare_category(heartfailure2, smoking, death_event)

# Print compare_category class objects
two_var

# Filtering the case of smoking included NA 
two_var %>%
  "[["(1) %>% 
  filter(!is.na(smoking))

# Summary the all case : Return a invisible copy of an object.
stat <- summary(all_var)

# Summary by returned objects
stat

# component of table 
stat$table

# component of chi-square test 
stat$chisq

# component of chi-square test 
summary(all_var, "chisq")

# component of chi-square test (first, third case)
summary(all_var, "chisq", pos = c(1, 3))

# component of relative frequency table 
summary(all_var, "relative")

# component of table without missing values 
summary(all_var, "table", na.rm = TRUE)

# component of table include marginal value 
margin <- summary(all_var, "table", marginal = TRUE)
margin

# component of chi-square test 
summary(two_var, method = "chisq")

# verbose is FALSE 
summary(all_var, "chisq", verbose = FALSE)

#' # Using pipes & dplyr -------------------------
# If you want to use dplyr, set verbose to FALSE
summary(all_var, "chisq", verbose = FALSE) %>% 
  filter(p.value < 0.26)

# Extract component from list by index
summary(all_var, "table", na.rm = TRUE, verbose = FALSE) %>% 
  "[["(1)

# Extract component from list by name
summary(all_var, "table", na.rm = TRUE, verbose = FALSE) %>% 
  "[["("smoking vs death_event")

# plot all pair of variables
plot(all_var)

# plot a pair of variables
plot(two_var)

# plot all pair of variables by prompt
plot(all_var, prompt = TRUE)

# plot a pair of variables
plot(two_var, las = 1)



dlookr documentation built on May 29, 2024, 2 a.m.