mlr_pipeops_vtreat | R Documentation |
Provides an interface to the vtreat package.
PipeOpVtreat
naturally works for classification tasks and regression tasks.
Internally, PipeOpVtreat
follows the fit/prepare interface of vtreat, i.e., first creating a data treatment transform object via
vtreat::NumericOutcomeTreatment()
, vtreat::BinomialOutcomeTreatment()
, or vtreat::MultinomialOutcomeTreatment()
, followed by calling
vtreat::fit_prepare()
on the training data and vtreat::prepare()
during predicton.
R6Class
object inheriting from PipeOpTaskPreproc
/PipeOp
.
PipeOpVreat$new(id = "vtreat", param_vals = list())
id
:: character(1)
Identifier of resulting object, default "vtreat"
.
param_vals
:: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list()
.
Input and output channels are inherited from PipeOpTaskPreproc
. Instead of a Task
, a
TaskSupervised
is used as input and output during training and prediction.
The output is the input Task
with all affected features "prepared" by vtreat.
If vtreat found "no usable vars", the input Task
is returned unaltered.
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
treatment_plan
:: object of class vtreat_pipe_step
| NULL
The treatment plan as constructed by vtreat based on the training data, i.e., an object of class treatment_plan
.
If vtreat found "no usable vars" and designing the treatment would have failed, this is NULL
.
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
recommended
:: logical(1)
Whether only the "recommended" prepared features should be returned, i.e., non constant
variables with a significance value smaller than vtreat's threshold. Initialized to TRUE
.
cols_to_copy
:: function
| Selector
Selector
function, takes a Task
as argument and returns a character()
of features to copy.
See Selector
for example functions. Initialized to selector_none()
.
minFraction
:: numeric(1)
Minimum frequency a categorical level must have to be converted to an indicator column.
smFactor
:: numeric(1)
Smoothing factor for impact coding models.
rareCount
:: integer(1)
Allow levels with this count or below to be pooled into a shared rare-level.
rareSig
:: numeric(1)
Suppress levels from pooling at this significance value greater.
collarProb
:: numeric(1)
What fraction of the data (pseudo-probability) to collar data at if doCollar = TRUE
.
doCollar
:: logical(1)
If TRUE
collar numeric variables by cutting off after a tail-probability specified by collarProb
during treatment design.
codeRestriction
:: character()
What types of variables to produce.
customCoders
:: named list
Map from code names to custom categorical variable encoding functions.
splitFunction
:: function
Function taking arguments nSplits, nRows, dframe, and y; returning a user desired split.
ncross
:: integer(1)
Integer larger than one, number of cross-validation rounds to design.
forceSplit
:: logical(1)
If TRUE
force cross-validated significance calculations on all variables.
catScaling
:: logical(1)
If TRUE
use stats::glm()
linkspace, if FALSE use stats::lm()
for scaling.
verbose
:: logical(1)
If TRUE
print progress.
use_parallel
:: logical(1)
If TRUE
use parallel methods.
missingness_imputation
:: function
Function of signature f(values: numeric, weights: numeric), simple missing value imputer.
Typically, an imputation via a PipeOp
should be preferred, see PipeOpImpute
.
pruneSig
:: numeric(1)
Suppress variables with significance above this level.
Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.
scale
:: logical(1)
If TRUE
replace numeric variables with single variable model regressions ("move to outcome-scale").
These have mean zero and (for variables with significant less than 1) slope 1 when regressed (lm for regression problems/glm for classification problems) against outcome.
varRestriction
:: list()
List of treated variable names to restrict to.
Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.
trackedValues
:: named list()
Named list mapping variables to know values, allows warnings upon novel level appearances (see vtreat::track_values()
).
Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.
y_dependent_treatments
:: character()
Character what treatment types to build per-outcome level.
Only effects multiclass classification tasks.
imputation_map
:: named list
List of map from column names to functions of signature f(values: numeric, weights: numeric), simple missing value imputers.
Typically, an imputation via a PipeOp
is to be preferred, see PipeOpImpute
.
For more information, see vtreat::regression_parameters()
, vtreat::classification_parameters()
, or vtreat::multinomial_parameters()
.
Follows vtreat's fit/prepare interface. See vtreat::NumericOutcomeTreatment()
, vtreat::BinomialOutcomeTreatment()
,
vtreat::MultinomialOutcomeTreatment()
, vtreat::fit_prepare()
and vtreat::prepare()
.
Only fields inherited from PipeOp
.
Only methods inherited from PipeOpTaskPreproc
/PipeOp
.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEncodePL
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_decode
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_encodeplquantiles
,
mlr_pipeops_encodepltree
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_yeojohnson
library("mlr3")
set.seed(2020)
make_data <- function(nrows) {
d <- data.frame(x = 5 * rnorm(nrows))
d["y"] = sin(d[["x"]]) + 0.01 * d[["x"]] + 0.1 * rnorm(nrows)
d[4:10, "x"] = NA # introduce NAs
d["xc"] = paste0("level_", 5 * round(d$y / 5, 1))
d["x2"] = rnorm(nrows)
d[d["xc"] == "level_-1", "xc"] = NA # introduce a NA level
return(d)
}
task = TaskRegr$new("vtreat_regr", backend = make_data(100), target = "y")
pop = PipeOpVtreat$new()
pop$train(list(task))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.