var_filter | R Documentation |
This function filter variables base on specified conditions, such as missing rate, identical value rate, information value.
var_filter(dt, y, x = NULL, lims = list(missing_rate = 0.95, identical_rate
= 0.95, info_value = 0.02), var_rm = NULL, var_kp = NULL,
var_rm_reason = FALSE, positive = "bad|1", ...)
dt |
A data frame with both x (predictor/feature) and y (response/label) variables. |
y |
Name of y variable. |
x |
Name of x variables. Defaults to NULL. If x is NULL, then all columns except y are counted as x variables. |
lims |
A list of variable filters' thresholds.
|
var_rm |
Name of force removed variables, Defaults to NULL. |
var_kp |
Name of force kept variables, Defaults to NULL. |
var_rm_reason |
Logical, Defaults to FALSE. |
positive |
Value of positive class, Defaults to "bad|1". |
... |
Additional parameters. |
A data frame with columns for y and selected x variables, and a data frame with columns for remove reason if var_rm_reason is TRUE.
# Load German credit data
data(germancredit)
# variable filter
dt_sel = var_filter(germancredit, y = "creditability")
dim(dt_sel)
# return the reason of varaible removed
dt_sel2 = var_filter(germancredit, y = "creditability", var_rm_reason = TRUE)
lapply(dt_sel2, dim)
str(dt_sel2$dt)
str(dt_sel2$rm)
# keep columns manually, such as rowid
germancredit$rowid = row.names(germancredit)
dt_sel3 = var_filter(germancredit, y = "creditability", var_kp = 'rowid')
# remove columns manually
dt_sel4 = var_filter(germancredit, y = "creditability", var_rm = 'rowid')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.