cleanse.split_df: Cleansing the dataset for classification modeling

View source: R/split.R

cleanse.split_dfR Documentation

Cleansing the dataset for classification modeling

Description

Diagnosis of similarity between datasets splitted by train set and set included in the "split_df" class. and cleansing the "split_df" class

Usage

## S3 method for class 'split_df'
cleanse(.data, add_character = FALSE, uniq_thres = 0.9, missing = FALSE, ...)

Arguments

.data

an object of class "split_df", usually, a result of a call to split_df().

add_character

logical. Decide whether to include text variables in the compare of categorical data. The default value is FALSE, which also not includes character variables.

uniq_thres

numeric. Set a threshold to removing variables when the ratio of unique values(number of unique values / number of observation) is greater than the set value.

missing

logical. Set whether to removing variables including missing value

...

further arguments passed to or from other methods.

Details

Remove the detected variables from the diagnosis using the compare_diag() function.

Value

An object of class "split_df".

Examples

library(dplyr)

# Credit Card Default Data
head(ISLR::Default)

# Generate data for the example
sb <- ISLR::Default %>%
  split_by(default)

sb %>%
  cleanse


alookr documentation built on June 12, 2022, 5:08 p.m.