var_filter: Variable Filter
In ShichenXie/scorecard: Credit Risk Scorecard

var_filter

R Documentation

Variable Filter

Description

This function filter variables base on specified conditions, such as missing rate, identical value rate, information value.

Usage

var_filter(dt, y, x = NULL, lims = list(missing_rate = 0.95, identical_rate
  = 0.95, info_value = 0.02), var_rm = NULL, var_kp = NULL,
  var_rm_reason = FALSE, positive = "bad|1", ...)

Arguments

`dt`	A data frame with both x (predictor/feature) and y (response/label) variables.
`y`	Name of y variable.
`x`	Name of x variables. Defaults to NULL. If x is NULL, then all columns except y are counted as x variables.
`lims`	A list of variable filters' thresholds. `missing_rate` The missing rate of kept variables should <= 0.95 by defaults. `identical_rate` The identical value rate (excluding NAs) of kept variables should <= 0.95 by defaults. `info_value` The information value (iv) of kept variables should >= 0.02 by defaults.
`var_rm`	Name of force removed variables, Defaults to NULL.
`var_kp`	Name of force kept variables, Defaults to NULL.
`var_rm_reason`	Logical, Defaults to FALSE.
`positive`	Value of positive class, Defaults to "bad\|1".
`...`	Additional parameters.

Value

A data frame with columns for y and selected x variables, and a data frame with columns for remove reason if var_rm_reason is TRUE.

Examples

# Load German credit data
data(germancredit)

# variable filter
dt_sel = var_filter(germancredit, y = "creditability")
dim(dt_sel)

# return the reason of varaible removed
dt_sel2 = var_filter(germancredit, y = "creditability", var_rm_reason = TRUE)
lapply(dt_sel2, dim)

str(dt_sel2$dt)
str(dt_sel2$rm)

# keep columns manually, such as rowid
germancredit$rowid = row.names(germancredit)
dt_sel3 = var_filter(germancredit, y = "creditability", var_kp = 'rowid')

# remove columns manually
dt_sel4 = var_filter(germancredit, y = "creditability", var_rm = 'rowid')

ShichenXie/scorecard documentation built on April 17, 2024, 8:55 p.m.