library(tree.bins)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

Overview

The package 'tree.bins' provides users the ability to recategorize categorical variables dependent on a response variable by iteratively creating a decision tree for each of the categorical variables (class factor) and the selected response variable. The decision tree is created from the rpart() function from the 'rpart' package. The rules from the leaves of the decision tree are extracted, and used to recategorize (bin) the appropriate categorical variable (predictor). This step is performed for each of the categorical variables that is passed onto the data component of the function. Only variables containing more than 2 factor levels will be considered in the function. The final output generates a data set containing the recategorized variables and/or a list containing a mapping table for each of the candidate variables. For more details see Dr. Yan-yan Song article (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4466856/) or T. Hastie et al (2009, ISBN: 978-0-387-84857-0). For detailed examples and functionality see vignettes.

Installation

You can install tree.bins with:

#Easiest way to install tree.bins is by:
install.packages("tree.bins")

#Alternatively, the development version from GitHub:
# install.packages("devtools")
devtools::install_github("pikos90/tree.bins")

Usage

Uses tree.bins() to recategorize your data.

## basic example code
sample.df <- AmesImpFctrs[, c("Neighborhood","MS.Zoning", "SalePrice" )]
recategorized.df <- tree.bins(data = sample.df, y = SalePrice)
head(recategorized.df)

Uses tree.bins() to create a list of mapping tables.

## basic example code
sample.df <- AmesImpFctrs[, c("Neighborhood","MS.Zoning", "SalePrice" )]
recategorized.list <- tree.bins(data = sample.df, y = SalePrice, return = "lkup.list")
head(recategorized.list[[1]])

Use that list to recategorize your a different data set with bin.oth().

other.sample.df <- AmesImpFctrs[, c("Neighborhood","MS.Zoning", "Sale.Condition", "SalePrice" )]
other.df <- bin.oth(list = recategorized.list, data = other.sample.df)
head(other.df)


pikos90/tree.bins documentation built on May 7, 2019, 3:16 p.m.