README.md

eRic

An R package containing functions I've written while working on predictive modeling projects. The functions are briefly described below, but documented in more detail in the function-specific help (.rd) files. All content can be downloaded/installed and then used like any other package after following the installation instructions below.

Installation Options

Install via devtools straight from GitHub:

install.packages("devtools")
library("devtools")
install_github("etlundquist/eRic")

Download the tarball eRic_0.0.0.9000.tar.gz and install from source:

install.packages('/path/to/tarball/eRic_0.0.0.9000.tar.gz', repos = NULL)

Included Functions

  1. bootImp - calculate variable importance/perform feature selection using bootstrap resampling and a combination of different filter/model-based methods. Available methods are:
  2. Information Value
  3. Chi2
  4. Random Forest
  5. GBM
  6. Bagged Earth/MARS
  7. Bagged Lasso

  8. evThreshold - calculate an optimal probability threshold for classification given the costs/benefits for each confusion matrix cell

  9. provides an estimate of the optimal probability cutoff with respect to confusion matrix utility
  10. provides an estimate of unit expected value given your model and the optimal probability cutoff

  11. varCluster - cluster predictor variables and extract cluster summaries for dimension reduction

  12. use agglomerative clustering (hclust) to group highly correlated sets of predictor variables
  13. create a single variable to summarize each cluster (highest pairwise, highest x-y, centroid, PC1)
  14. produce a correlation plot to visualize correlation structure in predictor matrix

  15. plotYXbin - produces a ggplot object with bin values of Y with respect to bins of X

  16. Y metrics include: [proportions, log-odds, WOE]
  17. X split methods include: [quantile-based, uniform splits, rpart-based splits]
  18. Can specify the desired number of bins and whether a missing value bin should be added
  19. Additionally calculates Information Value (IV) and Chi2 Statistic for the XY relationship

  20. prCalibrate - perform Platt Scaling on raw model predicted probabilities to better align with actual class proportions and produce a calibration plot to visualize results

  21. scale predicted probabilities with respect to either a calibration or independent validation set
  22. produce a calibration plot showing the relationship between actual and predicted class proportions

  23. plotKS - produces a ggplot object with a KS plot for two distributions as well as the KS statistic value

  24. tsSummary - calculate summary statistics (using passed summary functions) with respect to wide-format time series variables

  25. tsTrends - calculate linear trends with respect to wide-format time series variables



etlundquist/eRic documentation built on May 16, 2019, 9:07 a.m.