compare_target_category: Comparison of categorical variables of train set and test set

View source: R/split.R

compare_target_categoryR Documentation

Comparison of categorical variables of train set and test set

Description

Compare the statistics of the categorical variables of the train set and test set included in the "split_df" class.

Usage

compare_target_category(.data, ..., add_character = FALSE, margin = FALSE)

Arguments

.data

an object of class "split_df", usually, a result of a call to split_df().

...

one or more unquoted expressions separated by commas. Select the categorical variable you want to compare. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, compare_target_category() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing.

add_character

logical. Decide whether to include text variables in the compare of categorical data. The default value is FALSE, which also not includes character variables.

margin

logical. Choose to calculate the marginal frequency information.

Details

Compare the statistics of the numerical variables of the train set and the test set to determine whether the raw data is well separated into two data sets.

Value

tbl_df. Variables of tbl_df for comparison:

  • variable : character. categorical variable name

  • level : factor. level of categorical variables

  • train : numeric. the relative frequency of the level in the train set

  • test : numeric. the relative frequency of the level in the test set

  • abs_diff : numeric. the absolute value of the difference between two relative frequencies

Examples

library(dplyr)

# Credit Card Default Data
head(ISLR::Default)

# Generate data for the example
sb <- ISLR::Default %>%
  split_by(default)

sb %>%
  compare_target_category()

sb %>%
  compare_target_category(add_character = TRUE)

sb %>%
  compare_target_category(margin = TRUE)

sb %>%
  compare_target_category(student)

sb %>%
  compare_target_category(student, margin = TRUE)


alookr documentation built on May 29, 2024, 10:38 a.m.