| mlr_pipeops_regravg | R Documentation |
Perform (weighted) prediction averaging from regression Predictions by connecting
PipeOpRegrAvg to multiple PipeOpLearner outputs.
The resulting "response" prediction is a weighted average of the incoming "response" predictions.
Aggregation of "se" predictions is controlled by the se_aggr parameter (see below). When "se" is not requested
or se_aggr = "none", "se" is dropped.
R6Class inheriting from PipeOpEnsemble/PipeOp.
"se" AggregationLet there be K incoming predictions with weights w (sum to 1). For a given row j, denote
per-model means mu_i[j] and, if available, per-model standard errors se_i[j].
Define
mu_bar[j] = sum_i w[i] * mu_i[j] var_between[j] = sum_i w[i] * (mu_i[j] - mu_bar[j])^2 # weighted var of means var_within[j] = sum_i w[i] * se_i[j]^2 # weighted mean of SE^2s
The following aggregation methods are available:
se_aggr = "predictive" – Within + Between (mixture/predictive SD)
se[j] = sqrt(var_within[j] + var_between[j])
Interpretation. Treats each incoming se_i as that model's predictive SD at the point (or, if the learner
reports SE of the conditional mean–as many mlr3 regression learners do–then as that mean-SE). The returned se
is the SD of the mixture ensemble under weighted averaging: it increases when base models disagree (epistemic spread)
and when individual models are uncertain (aleatoric spread).
Notes. If se_i represents mean SE (common in predict.lm(se.fit=TRUE)-style learners), the result
aggregates those mean-SEs and still adds model disagreement correctly, but it will underestimate a true predictive SD
that would additionally include irreducible noise. Requires "se" to be present from all inputs.
se_aggr = "mean" – SE of the weighted average of means under equicorrelation
With a correlation parameter se_aggr_rho = rho, assume
Cov(mu_i_hat, mu_j_hat) = rho * se_i * se_j for all i != j. Then
# components: a[j] = sum_i (w[i]^2 * se_i[j]^2) b[j] = (sum_i w[i] * se_i[j])^2 var_mean[j] = (1 - rho) * a[j] + rho * b[j] se[j] = sqrt(var_mean[j])
Interpretation. Returns the standard error of the averaged estimator sum_i w[i] * mu_i, not a predictive SD.
Use when you specifically care about uncertainty of the averaged mean itself.
Notes. rho is clamped to the PSD range [-1/(K-1), 1] for K > 1. Typical settings:
rho = 0 (assume independence; often optimistic for CV/bagging) and rho = 1 (perfect correlation; conservative and
equal to the weighted arithmetic mean of SEs). Requires "se" from all inputs.
se_aggr = "within" – Within-model component only
se[j] = sqrt(var_within[j])
Interpretation. Aggregates only the average per-model uncertainty and ignores disagreement between models.
Useful as a diagnostic of the aleatoric component; not a full ensemble uncertainty.
Notes. Typically underestimates the uncertainty of the ensemble prediction when models disagree.
Requires "se" from all inputs.
se_aggr = "between" – Between-model component only (works without "se")
se[j] = sqrt(var_between[j])
Interpretation. Captures only the spread of the base means (epistemic/model disagreement).
Notes. This is the only method that does not use incoming "se". It is a lower bound on a full predictive SD,
because it omits within-model noise.
se_aggr = "none" – Do not return "se"
"se" is dropped from the output prediction.
Relationships and edge cases. For any row, se("predictive") >= max(se("within"), se("between")).
With a single input (K = 1), "predictive" and "within" return the input "se", "between" returns 0.
Methods "predictive", "mean", and "within" require all inputs to provide "se"; otherwise aggregation errors.
Weights can be set as a parameter; if none are provided, defaults to equal weights for each prediction.
PipeOpRegrAvg$new(innum = 0, collect_multiplicity = FALSE, id = "regravg", param_vals = list())
innum :: numeric(1)
Determines the number of input channels.
If innum is 0 (default), a vararg input channel is created that can take an arbitrary number of inputs.
collect_multiplicity :: logical(1)
If TRUE, the input is a Multiplicity collecting channel. This means, a
Multiplicity input, instead of multiple normal inputs, is accepted and the members are aggregated. This requires innum to be 0.
Default is FALSE.
id :: character(1)
Identifier of the resulting object, default "regravg".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpEnsemble. Instead of a Prediction, a PredictionRegr
is used as input and output during prediction.
The $state is left empty (list()).
The parameters are the parameters inherited from the PipeOpEnsemble, as well as:
se_aggr :: character(1)
Controls how incoming "se" values are aggregated into an ensemble "se". One of
"predictive", "mean", "within", "between", "none". See the description above for definitions and interpretation.
se_aggr_rho :: numeric(1)
Equicorrelation parameter used only for se_aggr = "mean". Interpreted as the common correlation between
per-model mean estimators. Recommended range [0, 1]; values are clamped to [-1/(K-1), 1] for validity.
Inherits from PipeOpEnsemble by implementing the private$weighted_avg_predictions() method.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpEnsemble/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_replicate
Other Ensembles:
PipeOpEnsemble,
mlr_learners_avg,
mlr_pipeops_classifavg,
mlr_pipeops_ovrunite
library("mlr3")
# Simple Bagging for Regression
gr = ppl("greplicate",
po("subsample") %>>%
po("learner", lrn("regr.rpart")),
n = 5
) %>>%
po("regravg")
resample(tsk("mtcars"), GraphLearner$new(gr), rsmp("holdout"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.