The package 'tree.bins' provides users the ability to recategorize categorical variables dependent on a response variable by iteratively creating a decision tree for each of the categorical variables (class factor) and the selected response variable. The decision tree is created from the rpart() function from the 'rpart' package. The rules from the leaves of the decision tree are extracted, and used to recategorize (bin) the appropriate categorical variable (predictor). This step is performed for each of the categorical variables that is passed onto the data component of the function. Only variables containing more than 2 factor levels will be considered in the function. The final output generates a data set containing the recategorized variables and/or a list containing a mapping table for each of the candidate variables. For more details see Dr. Yan-yan Song article (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4466856/) or T. Hastie et al (2009, ISBN: 978-0-387-84857-0). For detailed examples and functionality see vignettes.
You can install tree.bins with:
#Easiest way to install tree.bins is by:
install.packages("tree.bins")
#Alternatively, the development version from GitHub:
# install.packages("devtools")
devtools::install_github("pikos90/tree.bins")
Uses tree.bins() to recategorize your data.
## basic example code
sample.df <- AmesImpFctrs[, c("Neighborhood","MS.Zoning", "SalePrice" )]
recategorized.df <- tree.bins(data = sample.df, y = SalePrice)
head(recategorized.df)
#> SalePrice Neighborhood MS.Zoning
#> 1: 105.0 Group.4 Group.1
#> 2: 244.0 Group.4 Group.2
#> 3: 189.9 Group.3 Group.2
#> 4: 195.5 Group.3 Group.2
#> 5: 191.5 Group.5 Group.2
#> 6: 236.5 Group.5 Group.2
Uses tree.bins() to create a list of mapping tables.
## basic example code
sample.df <- AmesImpFctrs[, c("Neighborhood","MS.Zoning", "SalePrice" )]
recategorized.list <- tree.bins(data = sample.df, y = SalePrice, return = "lkup.list")
head(recategorized.list[[1]])
#> Neighborhood Categories
#> 1 BrDale Group.1
#> 2 BrkSide Group.1
#> 3 IDOTRR Group.1
#> 4 MeadowV Group.1
#> 5 OldTown Group.1
#> 6 Somerst Group.2
Use that list to recategorize your a different data set with bin.oth().
other.sample.df <- AmesImpFctrs[, c("Neighborhood","MS.Zoning", "Sale.Condition", "SalePrice" )]
other.df <- bin.oth(list = recategorized.list, data = other.sample.df)
head(other.df)
#> Sale.Condition SalePrice Neighborhood MS.Zoning
#> 1: Normal 105.0 Group.4 Group.1
#> 2: Normal 244.0 Group.4 Group.2
#> 3: Normal 189.9 Group.3 Group.2
#> 4: Normal 195.5 Group.3 Group.2
#> 5: Normal 191.5 Group.5 Group.2
#> 6: Normal 236.5 Group.5 Group.2
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.