library("mlr3")
library("mlr3pipelines")

Creating a PipeOp

Every PipeOp has a class name (the same as class(p)[[1]]) and a name inside the po() / mlr_pipeops dictionary, which is often the class name in lowercase without the PipeOp prefix.

Equivalent:

p = PipeOpPCA$new()
p = mlr_pipeops$get("pca")
p = po("pca")  # preferred

Construction arguments

Equivalent:

p = PipeOpPCA$new(id = "pca2", param_vals = list(center = FALSE, scale. = TRUE))
p = mlr_pipeops$get("pca", id = "pca2", param_vals = list(center = FALSE, scale. = TRUE))
p = po("pca", id = "pca2", param_vals = list(center = FALSE, scale. = TRUE))
p = po("pca", id = "pca2", center = FALSE, scale. = TRUE)  # preferred

Note the last line: po() automatically sets param_vals and other attributes of the constructed PipeOp based on further arguments.

PipeOp wrapping other objects

PipeOpLearner and PipeOpLearnerCV wrap mlr3::Learner, PipeOpFilter wraps mlr3filters::Filter.

Wrapping a Learner

The PipeOpLearner has NULL output during train phase and Prediction output during predict phase. Suppose we have a Learner l:

l = lrn("classif.rpart")

Then the following all are quivalent:

Equivalent:

p = PipeOpLearner$new(l)
p = mlr_pipeops$get("learner", l)
p = po("learner", l)  # preferred
p = as_pipeop(l)  # preferred

The param_values argument makes it possible to change Learner hyperparameter values. The last two lines are preferred. po("learner", l) can also set predict_type and hyperparameter values directly:

p = po("learner", l, predict_type = "prob", cp = 0.05)

Wrapping a Learner with Train-Time Cross-Validation

For stacking and threshold tuning it is necessary to have estimates of out-of-sample predictions during training. In that case a PipeOpLearnerCV is needed, which has Task output during both train and predict phase.

PipeOpLearnerCV works similarly (except for the as_pipeop() construct) as PipeOpLearner:

p = PipeOpLearnerCV$new(l)
p = mlr_pipeops$get("learner_cv", l)
p = po("learner_cv", l)  # preferred

and

p = po("learner_cv", l, predict_type = "prob", cp = 0.05)

Wrapping a Filter

Given a Filter f:

f = mlr3filters:::flt("anova")

Equivalent:

p = PipeOpFilter$new(f)
p = mlr_pipeops$get("filter", f)
p = po("filter", f)  # preferred
p = as_pipeop(f)  # preferred

Automatic Wrapping

Operations that expect a PipeOp or a Graph often automatically convert Learner and Filter objects (using as_pipeop() internally). Examples are:

Equivalent:

gr = po("filter", f) %>>% po("pca") %>>% po("learner", l)
gr = as_pipeop(f) %>>% po("pca") %>>% as_pipeop(l)
gr = f %>>% po("pca") %>>% l  # preferred

Creating a Graph

Graphs can be created and modified using several basic operations

The Graph

      ,--- pca ---.
branch             unbranch -- anova -- classif.rpart
      `--- nop ---'

using

f = mlr3filters::flt("anova")
l = lrn("classif.rpart")

Can be created in the following way:

gr = Graph$new()$
  add_pipeop(po("branch", 2))$
  add_pipeop(po("pca"))$
  add_pipeop(po("nop"))$
  add_pipeop(po("unbranch", 2))$
  add_pipeop(po("filter", f))$  # auto-convert to PipeOpFilter
  add_pipeop(po("learner", l))$  # auto-convert to PipeOpLearner
  add_edge("branch", "pca", src_channel = "output1")$
  add_edge("branch", "nop", src_channel = "output2")$
  add_edge("pca", "unbranch", dst_channel = "input1")$
  add_edge("nop", "unbranch", dst_channel = "input2")$
  add_edge("unbranch", "anova")$
  add_edge("anova", "classif.rpart")

(note that the src_channel = "output2" and dst_channel = "input2" are not required, since as soon as output1 / input1 are connected, the possible source / destination channel of the second edge are unambiguous.

Equivalent to the above construction are:

gr = po("branch", 2) %>>% gunion(list(po("pca"), po("nop"))) %>>%
  po("unbranch", 2) %>>% f %>>% l
gr = po("branch", 2) %>>% list(po("pca"), po("nop")) %>>%
  po("unbranch", 2) %>>% f %>>% l
gr = ppl("branch", list(po("pca"), po("nop"))) %>>% f %>>% l

The second option uses the automatic conversion of list to gunion() by %>>%. The last option uses the pre-packaged branch-pipeline.

There are many other ways of combining these methods. The following is an unconventional but legitimate way to build the same Graph:

gr = gunion(list(
    ppl("branch", list(po("pca"), po("nop"))),
    f %>>% l
  ))$add_edge("unbranch", "anova")

The use of gunion() is necessary here because the $add_edge method is used.

Vararg Channels

add_edge Automatic Channel Selection

%>>% Automatic Channel Selection

Common Graph Creation Pattern



mlr-org/mlr3pipelines documentation built on April 30, 2024, 6:21 p.m.