mlr_pipeops_tomek: Tomek Down-Sampling

mlr_pipeops_tomekR Documentation

Tomek Down-Sampling

Description

Generates a cleaner data set by removing all majority-minority Tomek links.

The algorithm down-samples the data by removing all pairs of observations that form a Tomek link, i.e. a pair of observations that are nearest neighbors and belong to different classes. For this only numeric and integer features are taken into account. These must have no missing values.

This can only be applied to classification tasks. Multiclass classification is supported.

See themis::tomek for details.

Format

R6Class object inheriting from PipeOpTaskPreproc/PipeOp.

Construction

PipeOpTomek$new(id = "tomek", param_vals = list())
  • id :: character(1)
    Identifier of resulting object, default "tomek".

  • param_vals :: named list
    List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a TaskClassif is used as input and output during training and prediction.

The output during training is the input Task with removed rows for pairs of observations that form a Tomek link. The output during prediction is the unchanged input.

State

The ⁠$state⁠ is a named list with the ⁠$state⁠ elements inherited from PipeOpTaskPreproc.

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc.

Fields

Only fields inherited from PipeOp.

Methods

Only methods inherited from PipeOpTaskPreproc/PipeOp.

References

Tomek I (1976). “Two Modifications of CNN.” IEEE Transactions on Systems, Man and Cybernetics, 6(11), 769–772. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1109/TSMC.1976.4309452")}.

See Also

https://mlr-org.com/pipeops.html

Other PipeOps: PipeOp, PipeOpEncodePL, PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreproc, PipeOpTaskPreprocSimple, mlr_pipeops, mlr_pipeops_adas, mlr_pipeops_blsmote, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_decode, mlr_pipeops_encode, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encodeplquantiles, mlr_pipeops_encodepltree, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_learner_pi_cvplus, mlr_pipeops_learner_quantiles, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nearmiss, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_rowapply, mlr_pipeops_scale, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_smotenc, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson

Examples


library("mlr3")

# Create example task
task = tsk("iris")
task$head()
table(task$data(cols = "Species"))

# Down-sample data
pop = po("tomek")
tomek_result = pop$train(list(task))[[1]]$data()
nrow(tomek_result)
table(tomek_result$Species)


mlr3pipelines documentation built on June 17, 2025, 9:08 a.m.