knitr::opts_chunk$set( collapse = TRUE, comment = " # ", fig.path = "tools/README-" )
CVRTSEncoder
is a categorical variable encoding for supervised learning.
This package is still in a research and development mode. Functionality and interfaces may change.
Re-encode a set of categorical variables jointly as a spectral projection of the trajectory of modeling residuals. This is intended as a succinct numeric linear representation of a set of categorical variables in a manner that is useful for supervised learning.
The concept is y-aware encoding the trajectory of non-linear model residuals in terms of target categorical variables.
The idea is an extension of the vtreat
coding concepts, the re-encoding concepts of JavaLogistic, and of the y-aware scaling concepts of Nina Zumel and John Mount:
The core idea is: other models factor the quantity to be explained into an explainable versus residual portion (with respect to the given model). Each of these components are possibly useful for modeling.
library("CVRTSEncoder") library("wrapr") data <- iris avars <- c("Sepal.Length", "Petal.Length") evars <- c("Sepal.Width", "Petal.Width") dep_var <- "Species" dep_target <- "versicolor" for(vi in evars) { data[[vi]] <- as.character(round(data[[vi]])) } str(data) cross_enc <- estimate_residual_encoding_c( data = data, avars = avars, evars = evars, dep_var = dep_var, dep_target = dep_target, n_comp = 4 ) enc <- prepare(cross_enc$coder, data) data <- cbind(data, enc) data %.>% head(.) %.>% knitr::kable(.) f0 <- wrapr::mk_formula(dep_var, avars, outcome_target = dep_target) print(f0) model0 <- glm(f0, data = data, family = binomial) summary(model0) data$pred0 <- predict(model0, newdata = data, type = "response") table(data$Species, data$pred0>0.5) newvars <- c(avars, colnames(enc)) f <- wrapr::mk_formula(dep_var, newvars, outcome_target = dep_target) print(f) model <- glmnet::cv.glmnet(as.matrix(data[, newvars, drop = FALSE]), as.numeric(data[[dep_var]]==dep_target), family = "binomial") coef(model, lambda = "lambda.min") data$pred <- as.numeric(predict(model, newx = as.matrix(data[, newvars, drop = FALSE]), s = "lambda.min")) table(data$Species, data$pred>0.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.