mlr_pipeops_collapsefactors | R Documentation |
Collapses factors of type factor
, ordered
: Collapses the rarest factors in the training samples, until target_level_count
levels remain. Levels that have prevalence strictly above no_collapse_above_prevalence
or absolute count strictly above no_collapse_above_absolute
are retained, however. For factor
variables, these are collapsed to the next larger level, for ordered
variables, rare variables
are collapsed to the neighbouring class, whichever has fewer samples.
In case both no_collapse_above_prevalence
and no_collapse_above_absolute
are given, the less strict threshold of the two will be used, i.e. if
no_collapse_above_prevalence
is 1 and no_collapse_above_absolute
is 10 for a task with 100 samples, levels that are seen more than 10 times
will not be collapsed.
Levels not seen during training are not touched during prediction; Therefore it is useful to combine this with the
PipeOpFixFactors
.
R6Class
object inheriting from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
PipeOpCollapseFactors$new(id = "collapsefactors", param_vals = list())
id
:: character(1)
Identifier of resulting object, default "collapsefactors"
.
param_vals
:: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list()
.
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with rare affected factor
and ordered
feature levels collapsed.
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
collapse_map
:: named list
of named list
of character
List of factor level maps. For each factor, collapse_map
contains a named list
that indicates what levels
of the input task get mapped to what levels of the output task. If collapse_map
has an entry feat_1
with
an entry a = c("x", "y")
, it means that levels "x"
and "y"
get collapsed to level "a"
in feature "feat_1"
.
The parameters are the parameters inherited from PipeOpTaskPreproc
, as well as:
no_collapse_above_prevalence
:: numeric(1)
Fraction of samples below which factor levels get collapsed. Default is 1, which causes all levels
to be collapsed until target_level_count
remain.
no_collapse_above_absolute
:: integer(1)
Number of samples below which factor levels get collapsed. Default is Inf
, which causes all levels
to be collapsed until target_level_count
remain.
target_level_count
:: integer(1)
Number of levels to retain. Default is 2.
Makes use of the fact that levels(fact_var) = list(target1 = c("source1", "source2"), target2 = "source2")
causes
renaming of level "source1"
and "source2"
both to "target1"
, and also "source2"
to "target2"
.
Only fields inherited from PipeOp
.
Only methods inherited from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEncodePL
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_decode
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_encodeplquantiles
,
mlr_pipeops_encodepltree
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_learner_pi_cvplus
,
mlr_pipeops_learner_quantiles
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nearmiss
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tomek
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
library("mlr3")
op = PipeOpCollapseFactors$new()
# Create example training task
df = data.frame(
target = runif(100),
fct = factor(rep(LETTERS[1:6], times = c(25, 30, 5, 15, 5, 20))),
ord = factor(rep(1:6, times = c(20, 25, 30, 5, 5, 15)), ordered = TRUE)
)
task = TaskRegr$new(df, target = "target", id = "example_train")
# Training
train_task_collapsed = op$train(list(task))[[1]]
train_task_collapsed$levels(c("fct", "ord"))
# Create example prediction task
df_pred = data.frame(
target = runif(7),
fct = factor(LETTERS[1:7]),
ord = factor(1:7, ordered = TRUE)
)
pred_task = TaskRegr$new(df_pred, target = "target", id = "example_pred")
# Prediction
pred_task_collapsed = op$predict(list(pred_task))[[1]]
pred_task_collapsed$levels(c("fct", "ord"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.