var_filter: Variable Filter

var_filterR Documentation

Variable Filter

Description

This function filter variables base on specified conditions, such as missing rate, identical value rate, information value.

Usage

var_filter(dt, y, x = NULL, lims = list(missing_rate = 0.95, identical_rate
  = 0.95, info_value = 0.02), var_rm = NULL, var_kp = NULL,
  var_rm_reason = FALSE, positive = "bad|1", ...)

Arguments

dt

A data frame with both x (predictor/feature) and y (response/label) variables.

y

Name of y variable.

x

Name of x variables. Defaults to NULL. If x is NULL, then all columns except y are counted as x variables.

lims

A list of variable filters' thresholds.

  • missing_rate The missing rate of kept variables should <= 0.95 by defaults.

  • identical_rate The identical value rate (excluding NAs) of kept variables should <= 0.95 by defaults.

  • info_value The information value (iv) of kept variables should >= 0.02 by defaults.

var_rm

Name of force removed variables, Defaults to NULL.

var_kp

Name of force kept variables, Defaults to NULL.

var_rm_reason

Logical, Defaults to FALSE.

positive

Value of positive class, Defaults to "bad|1".

...

Additional parameters.

Value

A data frame with columns for y and selected x variables, and a data frame with columns for remove reason if var_rm_reason is TRUE.

Examples

# Load German credit data
data(germancredit)

# variable filter
dt_sel = var_filter(germancredit, y = "creditability")
dim(dt_sel)

# return the reason of varaible removed
dt_sel2 = var_filter(germancredit, y = "creditability", var_rm_reason = TRUE)
lapply(dt_sel2, dim)

str(dt_sel2$dt)
str(dt_sel2$rm)

# keep columns manually, such as rowid
germancredit$rowid = row.names(germancredit)
dt_sel3 = var_filter(germancredit, y = "creditability", var_kp = 'rowid')

# remove columns manually
dt_sel4 = var_filter(germancredit, y = "creditability", var_rm = 'rowid')


ShichenXie/scorecard documentation built on April 17, 2024, 8:55 p.m.