tree.bins: Recategorization of Factor Variables by Decision Tree Leaves

Description Usage Arguments See Also Examples

View source: R/tree.bins.R

Description

The function takes in a data set that contains categorical variable(s) and a response variable. It creates a decision tree by using one of the categorical variables (class factor) and the response variable. The decision tree is created from the rpart() function from the 'rpart' package The rules from the leaves of the decision tree are extracted, and used to recategorize the appropriate categorical variable (predictor). This step is performed for each of the categorical (class factor) variables that is fed into the data component of the function. Only variables containing more than 2 factors will be considered in the function. The final output generates a data set containing the recategorized variables or a list containing a mapping table for each of the candidate variables.

Usage

1
2
tree.bins(data, y, bin.nm = "Group.", method = NULL, control = NULL,
  return = "new.fctrs")

Arguments

data

A data.frame.

y

The response variables to be used in the rpart() function.

bin.nm

The string that will be used to categorize the variables. The default "Group." will be assigned. E.g. If a variable of 6 factors is recategorized into 3 factors, then setting bin.name equal to "Group." will name the three new factors to "Group.1", "Group.2", and "Group.3"

method

This is the method that will be used in the rpart() function. If null, the default method will be used. See rpart() for further detail.

control

This is the control that will be used in the rpart() function. The user has 3 options, one of which is the default selected control by the rpart() function. The remaining two option are: 1) Specity a cp value which will prune each decision tree by the specified value or 2) Specity a two-dimensional data.frame() that contains the variable name(s) as identified in the data component for the first column and the respective cp of each variable in the second column. Variable(s) not included in this data.frame() will use the cp generated by the rpart() function. See rpart() and rpart.control() for further detail.

return

This is what the function will return. There are three options: 1) new.fctrs - will provide a data.frame with the recategorized categorical variables. 2) lkup.list - will provide a list of lookup tables. Each element will contain the original to new mapping for each recategorized variable. 3) both - it will return both: the new.fctrs and lkup.list objects.

See Also

bin.oth, rpart, rpart.control,rpart.lists

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#Returns a data.frame of recategorized variables
library(rpart)
sample.df <- AmesImpFctrs[, c("Neighborhood", "MS.Zoning", "SalePrice")]
tree.bins(data = sample.df, y = SalePrice)

#Returns a list of mapping tables generated from tree.bins()
tree.bins(data = sample.df, y = SalePrice, return = "lkup.list")

#Allows the user to choose the naming convention for the attribute naming convention
tree.bins(data = sample.df, y = SalePrice, bin.nm = "bin#")

#Allows user to manually assign a cp to each decision tree evaluated in rpart()
tree.bins(data = sample.df, y = SalePrice, control = rpart.control(cp = .01))

#Allows user to manually assign a cp to specified variables
demo.df <- data.frame(Variables = c("Neighborhood", "MS.Zoning"), CP = c(.001, .2))
tree.bins(data = sample.df, y = SalePrice, control = demo.df)

tree.bins documentation built on May 2, 2019, 12:20 p.m.