cleanse.split_df: Cleansing the dataset for classification modeling
In alookr: Model Classifier for Binary Classification

cleanse.split_df

R Documentation

Cleansing the dataset for classification modeling

Description

Diagnosis of similarity between datasets splitted by train set and set included in the "split_df" class. and cleansing the "split_df" class

Usage

## S3 method for class 'split_df'
cleanse(.data, add_character = FALSE, uniq_thres = 0.9, missing = FALSE, ...)

Arguments

`.data`	an object of class "split_df", usually, a result of a call to split_df().
`add_character`	logical. Decide whether to include text variables in the compare of categorical data. The default value is FALSE, which also not includes character variables.
`uniq_thres`	numeric. Set a threshold to removing variables when the ratio of unique values(number of unique values / number of observation) is greater than the set value.
`missing`	logical. Set whether to removing variables including missing value
`...`	further arguments passed to or from other methods.

Details

Remove the detected variables from the diagnosis using the compare_diag() function.

Value

An object of class "split_df".

Examples

library(dplyr)

# Credit Card Default Data
head(ISLR::Default)

# Generate data for the example
sb <- ISLR::Default %>%
  split_by(default)

sb %>%
  cleanse

alookr documentation built on May 29, 2024, 10:38 a.m.