var_filter: Variable Filter

Description Usage Arguments Value Examples

View source: R/var_filter.R

Description

This function filter variables base on specified conditions, such as information value, missing rate, identical value rate.

Usage

1
2
3
var_filter(dt, y, x = NULL, iv_limit = 0.02, missing_limit = 0.95,
  identical_limit = 0.95, var_rm = NULL, var_kp = NULL,
  return_rm_reason = FALSE, positive = "bad|1")

Arguments

dt

A data frame with both x (predictor/feature) and y (response/label) variables.

y

Name of y variable.

x

Name of x variables. Defaults to NULL. If x is NULL, then all columns except y are counted as x variables.

iv_limit

The information value of kept variables should >= iv_limit. The Defaults to 0.02.

missing_limit

The missing rate of kept variables should <= missing_limit. The Defaults to 0.95.

identical_limit

The identical value rate (excluding NAs) of kept variables should <= identical_limit. The Defaults to 0.95.

var_rm

Name of force removed variables, Defaults to NULL.

var_kp

Name of force kept variables, Defaults to NULL.

return_rm_reason

Logical, Defaults to FALSE.

positive

Value of positive class, Defaults to "bad|1".

Value

A data frame with columns for y and selected x variables, and a data frame with columns for remove reason if return_rm_reason == TRUE.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Load German credit data
data(germancredit)

# variable filter
dt_sel = var_filter(germancredit, y = "creditability")
dim(dt_sel)

# return the reason of varaible removed
dt_sel2 = var_filter(germancredit, y = "creditability", return_rm_reason = TRUE)
lapply(dt_sel2, dim)

str(dt_sel2$dt)
str(dt_sel2$rm)

# keep columns manually, such as rowid
germancredit$rowid = row.names(germancredit)
dt_sel3 = var_filter(germancredit, y = "creditability", var_kp = 'rowid')

# remove columns manually
dt_sel4 = var_filter(germancredit, y = "creditability", var_rm = 'rowid')

scorecard documentation built on Aug. 30, 2020, 5:06 p.m.