estimate_residual_encoding_c: Build residual class classification trajectory.

Description Usage Arguments Value Examples

Description

Build a cross-validated residual trajectory for a model.

Usage

1
2
3
4
5
estimate_residual_encoding_c(data, ...,
  fit_predict_c = xgboost_fit_predict_c,
  fit_predict_r = xgboost_fit_predict_r, evars, avars, dep_var,
  dep_target = TRUE, cross_plan = vtreat::kWayStratifiedY(nrow(data),
  3, data, data[[dep_var]] == dep_target), n_comp = 20, cl = NULL)

Arguments

data

The data.frame of data to fit.

...

not used, force arguments to be bound by name

fit_predict_c

A function with signature fit_predict_c(train_data, vars, dep_var, dep_target, application_data) that returns a matrix with one row of predictions per row of appication_data, and an ordered set of columns of predictions.

fit_predict_r

A function with signature fit_predict_r(train_data, vars, dep_var, application_data) that returns a matrix with one row of predictions per row of appication_data, and an ordered set of columns of predictions.

evars

character vector, categorical explanatory variable names to be encoded.

avars

character vector, additional explanatory variable names.

dep_var

character, the name of dependent variable.

dep_target

scalar, the value considered to be the target category of dep_var.

cross_plan

a vtreat-style cross validation plan for data rows (list of disjoint tran/app lists where app partitions the data rows).

n_comp

number of components to generate

cl

parallel cluster for processing

Value

a matrix with the same number of rows as data representing the cross-validated modeling residual trajectories.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
data <- iris
avars <- c("Sepal.Length", "Petal.Length")
evars <- c("Sepal.Width", "Petal.Width")
dep_var <- "Species"
dep_target <- "versicolor"
for(vi in evars) {
  data[[vi]] <- as.character(round(data[[vi]]))
}
cross_enc <- estimate_residual_encoding_c(
  data = data,
  avars = avars,
  evars = evars,
  dep_var = dep_var,
  dep_target = dep_target,
  n_comp = 4
)
enc <- prepare(cross_enc$coder, data)
data <- cbind(data, enc)
newvars <- c(avars, colnames(enc))
f <- wrapr::mk_formula(dep_var, newvars, outcome_target = dep_target)
model <- glmnet::cv.glmnet(as.matrix(data[, newvars, drop = FALSE]), as.numeric(data[[dep_var]]==dep_target), family = "binomial")
coef(model, lambda = "lambda.min")
data$pred <- as.numeric(predict(model, newx = as.matrix(data[, newvars, drop = FALSE]), s = "lambda.min"))
table(data$Species, data$pred>0.5)

WinVector/CVRTSEncoder documentation built on June 7, 2019, 9:53 a.m.