cleanse.data.frame | R Documentation |
The cleanse() cleanse the dataset for classification modeling
## S3 method for class 'data.frame' cleanse( .data, uniq = TRUE, uniq_thres = 0.1, char = TRUE, missing = FALSE, verbose = TRUE, ... ) cleanse(.data, ...)
.data |
a data.frame or a |
uniq |
logical. Set whether to remove the variables whose unique value is one. |
uniq_thres |
numeric. Set a threshold to removing variables when the ratio of unique values(number of unique values / number of observation) is greater than the set value. |
char |
logical. Set the change the character to factor. |
missing |
logical. Set whether to removing variables including missing value |
verbose |
logical. Set whether to echo information to the console at runtime. |
... |
further arguments passed to or from other methods. |
This function is useful when fit the classification model. This function does the following.: Remove the variable with only one value. And remove variables that have a unique number of values relative to the number of observations for a character or categorical variable. In this case, it is a variable that corresponds to an identifier or an identifier. And converts the character to factor.
An object of data.frame or train_df. and return value is an object of the same type as the .data argument.
# create sample dataset set.seed(123L) id <- sapply(1:1000, function(x) paste(c(sample(letters, 5), x), collapse = "")) year <- "2018" set.seed(123L) count <- sample(1:10, size = 1000, replace = TRUE) set.seed(123L) alpha <- sample(letters, size = 1000, replace = TRUE) set.seed(123L) flag <- sample(c("Y", "N"), size = 1000, prob = c(0.1, 0.9), replace = TRUE) dat <- data.frame(id, year, count, alpha, flag, stringsAsFactors = FALSE) # structure of dataset str(dat) # cleansing dataset newDat <- cleanse(dat) # structure of cleansing dataset str(newDat) # cleansing dataset newDat <- cleanse(dat, uniq = FALSE) # structure of cleansing dataset str(newDat) # cleansing dataset newDat <- cleanse(dat, uniq_thres = 0.3) # structure of cleansing dataset str(newDat) # cleansing dataset newDat <- cleanse(dat, char = FALSE) # structure of cleansing dataset str(newDat)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.