mlr_pipeops_vtreat | R Documentation |
Provides an interface to the vtreat package.
PipeOpVtreat
naturally works for classification tasks and regression tasks.
Internally, PipeOpVtreat
follows the fit/prepare interface of vtreat, i.e., first creating a data treatment transform object via
vtreat::NumericOutcomeTreatment()
, vtreat::BinomialOutcomeTreatment()
, or vtreat::MultinomialOutcomeTreatment()
, followed by calling
vtreat::fit_prepare()
on the training data and vtreat::prepare()
during predicton.
R6Class
object inheriting from PipeOpTaskPreproc
/PipeOp
.
PipeOpVreat$new(id = "vtreat", param_vals = list())
id
:: character(1)
Identifier of resulting object, default "vtreat"
.
param_vals
:: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list()
.
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with all affected features "prepared" by vtreat.
If vtreat found "no usable vars", the input Task
is returned unaltered.
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
treatment_plan
:: object of class vtreat_pipe_step
| NULL
The treatment plan as constructed by vtreat based on the training data, i.e., an object of class treatment_plan
.
If vtreat found "no usable vars" and designing the treatment would have failed, this is NULL
.
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
recommended
:: logical(1)
Whether only the "recommended" prepared features should be returned, i.e., non constant
variables with a significance value smaller than vtreat's threshold. Initialized to TRUE
.
cols_to_copy
:: function
| Selector
Selector
function, takes a Task
as argument and returns a character()
of features to copy.
See Selector
for example functions. Initialized to selector_none()
.
minFraction
:: numeric(1)
Minimum frequency a categorical level must have to be converted to an indicator column.
smFactor
:: numeric(1)
Smoothing factor for impact coding models.
rareCount
:: integer(1)
Allow levels with this count or below to be pooled into a shared rare-level.
rareSig
:: numeric(1)
Suppress levels from pooling at this significance value greater.
collarProb
:: numeric(1)
What fraction of the data (pseudo-probability) to collar data at if doCollar = TRUE
.
doCollar
:: logical(1)
If TRUE
collar numeric variables by cutting off after a tail-probability specified by collarProb
during treatment design.
codeRestriction
:: character()
What types of variables to produce.
customCoders
:: named list
Map from code names to custom categorical variable encoding functions.
splitFunction
:: function
Function taking arguments nSplits, nRows, dframe, and y; returning a user desired split.
ncross
:: integer(1)
Integer larger than one, number of cross-validation rounds to design.
forceSplit
:: logical(1)
If TRUE
force cross-validated significance calculations on all variables.
catScaling
:: logical(1)
If TRUE
use stats::glm()
linkspace, if FALSE use stats::lm()
for scaling.
verbose
:: logical(1)
If TRUE
print progress.
use_paralell
:: logical(1)
If TRUE
use parallel methods.
missingness_imputation
:: function
Function of signature f(values: numeric, weights: numeric), simple missing value imputer.
Typically, an imputation via a PipeOp
should be preferred, see PipeOpImpute
.
pruneSig
:: numeric(1)
Suppress variables with significance above this level.
Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.
scale
:: logical(1)
If TRUE
replace numeric variables with single variable model regressions ("move to outcome-scale").
These have mean zero and (for variables with significant less than 1) slope 1 when regressed (lm for regression problems/glm for classification problems) against outcome.
varRestriction
:: list()
List of treated variable names to restrict to.
Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.
trackedValues
:: named list()
Named list mapping variables to know values, allows warnings upon novel level appearances (see vtreat::track_values()
).
Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.
y_dependent_treatments
:: character()
Character what treatment types to build per-outcome level.
Only effects multiclass classification tasks.
imputation_map
:: named list
List of map from column names to functions of signature f(values: numeric, weights: numeric), simple missing value imputers.
Typically, an imputation via a PipeOp
is to be preferred, see PipeOpImpute
.
For more information, see vtreat::regression_parameters()
, vtreat::classification_parameters()
, or vtreat::multinomial_parameters()
.
Follows vtreat's fit/prepare interface. See vtreat::NumericOutcomeTreatment()
, vtreat::BinomialOutcomeTreatment()
,
vtreat::MultinomialOutcomeTreatment()
, vtreat::fit_prepare()
and vtreat::prepare()
.
Only methods inherited from PipeOpTaskPreproc
/PipeOp
.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_yeojohnson
library("mlr3")
set.seed(2020)
make_data <- function(nrows) {
d <- data.frame(x = 5 * rnorm(nrows))
d["y"] = sin(d[["x"]]) + 0.01 * d[["x"]] + 0.1 * rnorm(nrows)
d[4:10, "x"] = NA # introduce NAs
d["xc"] = paste0("level_", 5 * round(d$y / 5, 1))
d["x2"] = rnorm(nrows)
d[d["xc"] == "level_-1", "xc"] = NA # introduce a NA level
return(d)
}
task = TaskRegr$new("vtreat_regr", backend = make_data(100), target = "y")
pop = PipeOpVtreat$new()
pop$train(list(task))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.