simple_bin: Discretize variables in your training and test datasets

Description Usage Arguments Details Value See Also

Description

Function to apply simple equal-width or equal-height binning to columns of a training dataset, and then optionally bin the columns of a test set into bins with the appropriate cutpoints

Usage

1
2
simple_bin(train, test = NULL, exclude_vars = NULL, include_vars = NULL,
  bins, type = "height", na_include = TRUE)

Arguments

train

training set

test

test set

exclude_vars

variables to exclude (e.g. the target, or the row ID)

include_vars

if you only want certain variables binned, you may specify them directly instead of excluding all other variables

bins

single number specifying the number of bins to create on each variable, or a named list specifying cut-points for each variable

type

if bins is given as a number, then this determines whether to create bins with equal number of observations ("height") or of equal width ("width")

na_include

logical. Give missing values their own bin?

Details

This function was built as a convenience, to automate the process of binning continuous variables into disrete levels, and also to provide a simple, interpretible, unambiguous method of dealing with missing values in data science problems.

Value

if test is not NULL, a list containing two tbl_df objects, with appropriate columns replaced by their binned values and all other columns unchanged if test is NULL, returns the training set portion of the list

See Also

vector_bin, get_vector_cutpoints

Other discretization: binned_data_cutpoints, get_vector_cutpoints, vector_bin


awstringer/modellingTools documentation built on May 11, 2019, 4:11 p.m.