| mlr_pipeops_subsample | R Documentation |
Subsamples a Task to use a fraction of the rows.
Sampling happens only during training phase. Subsampling a Task may be
beneficial for training time at possibly (depending on original Task size)
negligible cost of predictive performance.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpSubsample$new(id = "subsample", param_vals = list())
id :: character(1)
Identifier of the resulting object, default "subsample"
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output during training is the input Task with added or removed rows according to the sampling.
The output during prediction is the unchanged input.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc; however, the affect_columns parameter is not present. Further parameters are:
frac :: numeric(1)
Fraction of rows in the Task to keep. May only be greater than 1 if replace is TRUE. Initialized to (1 - exp(-1)) == 0.6321.
stratify :: logical(1)
Should the subsamples be stratified by target? Initialized to FALSE. May only be TRUE for TaskClassif input and if use_groups = FALSE.
use_groups :: logical(1)
If TRUE and if the Task has a column with role group, grouped observations are kept together during subsampling. In case of sampling with
replace :: logical(1)
Sample with replacement? Initialized to FALSE.
Uses task$filter() to remove rows. If replace is TRUE and identical rows are added, then the task$row_roles$use can not be used
to duplicate rows because of [inaudible]; instead the task$rbind() function is used, and
a new data.table is attached that contains all rows that are being duplicated exactly as many times as they are being added.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3")
# Subsample with stratification
pop = po("subsample", frac = 0.7, stratify = TRUE, use_groups = FALSE)
pop$train(list(tsk("iris")))
# Subsample, respecting grouping
df = data.frame(
target = runif(3000),
x1 = runif(3000),
x2 = runif(3000),
grp = sample(paste0("g", 1:100), 3000, replace = TRUE)
)
task = TaskRegr$new(id = "example", backend = df, target = "target")
task$set_col_roles("grp", "group")
pop = po("subsample", frac = 0.7, use_groups = TRUE)
pop$train(list(task))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.