compare_category | R Documentation |
The compare_category() compute information to examine the relationship between categorical variables.
compare_category(.data, ...)
## S3 method for class 'data.frame'
compare_category(.data, ...)
.data |
a data.frame or a |
... |
one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing. |
It is important to understand the relationship between categorical variables in EDA. compare_category() compares relations by pair combination of all categorical variables. and return compare_category class that based list object.
An object of the class as compare based list. The information to examine the relationship between categorical variables is as follows each components.
var1 : factor. The level of the first variable to compare. 'var1' is the name of the first variable to be compared.
var2 : factor. The level of the second variable to compare. 'var2' is the name of the second variable to be compared.
n : integer. frequency by var1 and var2.
rate : double. relative frequency.
first_rate : double. relative frequency in first variable.
second_rate : double. relative frequency in second variable.
Attributes of compare_category class is as follows.
variables : character. List of variables selected for comparison.
combination : matrix. It consists of pairs of variables to compare.
summary.compare_category
, print.compare_category
, plot.compare_category
.
# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "smoking"] <- NA
library(dplyr)
# Compare the all categorical variables
all_var <- compare_category(heartfailure2)
# Print compare_numeric class objects
all_var
# Compare the categorical variables that case of joint the death_event variable
all_var %>%
"["(grep("death_event", names(all_var)))
# Compare the two categorical variables
two_var <- compare_category(heartfailure2, smoking, death_event)
# Print compare_category class objects
two_var
# Filtering the case of smoking included NA
two_var %>%
"[["(1) %>%
filter(!is.na(smoking))
# Summary the all case : Return a invisible copy of an object.
stat <- summary(all_var)
# Summary by returned objects
stat
# component of table
stat$table
# component of chi-square test
stat$chisq
# component of chi-square test
summary(all_var, "chisq")
# component of chi-square test (first, third case)
summary(all_var, "chisq", pos = c(1, 3))
# component of relative frequency table
summary(all_var, "relative")
# component of table without missing values
summary(all_var, "table", na.rm = TRUE)
# component of table include marginal value
margin <- summary(all_var, "table", marginal = TRUE)
margin
# component of chi-square test
summary(two_var, method = "chisq")
# verbose is FALSE
summary(all_var, "chisq", verbose = FALSE)
#' # Using pipes & dplyr -------------------------
# If you want to use dplyr, set verbose to FALSE
summary(all_var, "chisq", verbose = FALSE) %>%
filter(p.value < 0.26)
# Extract component from list by index
summary(all_var, "table", na.rm = TRUE, verbose = FALSE) %>%
"[["(1)
# Extract component from list by name
summary(all_var, "table", na.rm = TRUE, verbose = FALSE) %>%
"[["("smoking vs death_event")
# plot all pair of variables
plot(all_var)
# plot a pair of variables
plot(two_var)
# plot all pair of variables by prompt
plot(all_var, prompt = TRUE)
# plot a pair of variables
plot(two_var, las = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.