This document contains a list of use-cases we want to contain.
Shortnames: [G] : Graph
PipeOpBranch
| broadcastPipeOpChunk
| broadcastPipeOpUnbranch
| aggregatePipeOpFeatureUnion
| aggregatePipeOpNOP
| linearPipeOpCopy
| broadcast
train: input --store-params--> output predict: input --use-params--> output
PipeOpLearner
| linear | task --model--> NULL | task --model--> predictionPipeOpLearnerCV
| linear | task --model--> cvtask | task --model--> predictionPipeOpModelAverage
| aggregate | task --NULL--> NULL | list-of-prediction --NULL--> predictionPreprocessing:
PipeOpPCA
| linear | task --params--> task | task --params--> taskPipeOpScale
| linear | task --params--> task | task --params--> taskPipeOpDownsample
| linear | task --NULL--> task | task --NULL--> task
Target Operators:
PipeOpThreshold
| linear | cvtask --threshold--> NULL | prediction --threshold--> predictionPipeOpTrafoY
| linear | task --NULL--> task | prediction --NULL--> predictionPipeOpMultiClass2Binary
| broadcast | task --NULL--> list-of-task | task --NULL--> list-of-tasksPipeOpSetTarget
| linear | task --NULL--> task | task --NULL--> tasktrain()
method on it. Stores the model in state.predict()
method of the stored model.Wraps a Graph and allows it to be used like a learner.
- train:
- input: [[Task]]
- does: Calls the graph's train()
method on it.
- returns: NULL
- predict:
- input: [[Task]]
- does: Calls the predict()
method on the Graph.
- returns: Prediction
Scenario: We obtain a task (iris) and a learner(rpart) from mlr3. Before we train the learner, we want to transform the data using PCA.
We concatenate a PipeOpPCA
and a PipeOpLearner
using the %>>%
(then) operator.
This internally does the following:
- Wrap the [PO's] into GraphNodes.
- Chains them together and returns a Graph.
task = mlr_tasks$get("iris") lrn_rp = mlr_learners$get("classif.rpart") g = PipeOpScale() %>>% PipeOpPCA() %>>% PipeOpLearner(lrn_rp) g$train(task) g$predict(task)
Scenario: We want to resample the Graph on different folds of the data.
We create a GraphLearner
from the Graph and use mlr3's resampling.
lrn_g = GraphLearner$new(graph = g) lrn$parvals = list(pca.center = TRUE, rpart.cp = 0.1) resampling = mlr_resamplings$get("holdout") rr = resample([[Task]], lrn_g, resampling)
Access results:
rr[1, "models"]$learner.model[["rpart"]]$learner.model rr[1, "models"]$learner.model[["pca"]]$params
Scenario: We want to tune the Graph.
We use the GraphLearner
and use mlr3's tuning.
measures = mlr_measures$mget("mmce") param_set = paradox::ParamSet$new( params = list( ParamLgl$new("pca.center"), ParamDbl$new("rpart.cp", lower = 0.001, upper = 0.1) ) ff = FitnessFunction$new([[Task]], lrn_g, resampling, measures, param_set) terminator = TerminatorEvaluations$new(10) rs = TunerRandomSearch$new(ff, terminator) tr = rs$tune()$tune_result()
op1 = PipeOpScale$new() op2a = PipeOpPCA$new() op2b = PipeOpNOP$new() op3 = PipeOpFeatureUnion$new() op4 = PipeOpLearner$new(learner = "classif.rpart") op1 %>>% gunion(op2a, op2b) %>>% op3 %>>% op4
Scenario: We want to do bagging (Train several models on subsamples of the data and average predictions).
We use the PipeOpDownSample
operator in conjunction with a PipeOpLearner
to train a model. pipeline_greplicate()
let's us do the same operation multiple times.
Afterwards we average all predictions using PipeOpModelAverage
op1 = PipeOpDownSample$new(rate = 0.6) op2 = PipeOpLearner$new("classif.rpart") op3 = PipeOpModelAverage$new() pipeline_greplicate(op1 %>>% op2, 30) %>>% op3
Info: If our predictions are numeric, we simply average. If our predictions are binary, we majority vote (?) If our predictions are probabilities, we average (?) If our predictions are multiclass, we (?). Are there any other situations ?
Scenario: We want to do stacking (Train several models on the data and combine predictions).
We use various PipeOpLearner
's' to train models. gunion()
let's us put the learner's parallel to each other.
Afterwards we average all predictions using PipeOpModelAverage
.
op1 = PipeOpLearner$new("regr.rpart") op2 = PipeOpLearner$new("regr.svm") gunion(op1, op2) %>>% PipeOpModelAverage$new()
Instead of using PipeOpModelAverage
, we combine predictions to a PipeOpLearner
.
Instead of a PipeOpLearner
we use a PipeOpLearnerCV
, in order to avoid overfitting.
# Superlearner: We instead use PipeOpLearnerCV op1 = PipeOpLearnerCV("regr.rpart") op2 = PipeOpLearnerCV("regr.svm") gunion(op1, op2) %>>% PipeOpFeatureUnion() %>>% PipeOpLearner("regr.lm")
By adding a PipeOpNull
, we add the original features to the SuperLearner.
gunion(op1, op2, PipeOpNull) %>>% PipeOpFeatureUnion() %>>% PipeOpLearner("regr.lm")
We can do the same on multiple levels by just adding the same PipeOpLearnerCV()
again after the feature union.
g = gunion(op1, op2, PipeOpNull) %>>% PipeOpFeatureUnion() %>>% gunion(op1, op2) %>>% PipeOpFeatureUnion() %>>% PipeOpLearner("regr.lm")
Scenario: We have a multiclass target, and want to predict each class in a binarized manner. This occurs, for example if our model can only do binary classification.
We use PipeOpMultiClass2Binary
in order to split our [[Task]] up into multiple binary [[Task]]s.
Afterwards, we replicate our learner $k$ (where $k$ = number of classes - 1) times.
In order to aggregate the predictions for different classes, we use the PipeOpModelAverage
.
op1 = PipeOpMultiClass2Binary(codebook) op2 = PipeOpLearner("classif.svm") op1 %>>% pipeline_greplicate(op2, k) %>>% PipeOpModelAverage$new() # or: op1 %>=>% pipeline_greplicate(op2, k) %>>% PipeOpModelAverage$new()
Scenario: We want our pipeline to branch out, either in one direction or the other. This is usefull, for example when tuning over multiple learners.
We use the PipeOpBranch
in order to have our data flow only to one of the following operators.
Afterwards we collect the two streams using PipeOpUnbranch
.
We can now treat the pipeline like a linear pipeline.
op1 = PipeOpLearner$new("regr.rpart") op2 = PipeOpLearner$new("regr.svm") g = PipeOpBranch$new(selected = 1) %>>% gunion(op1, op2) %>>% PipeOpUnbranch(aggrFun = NULL)
op1 = PipeOpLearnerPCA$new() op2 = PipeOpNOP$new() op3 = PipeOpLearner$new("classif.rpart") g = PipeOpBranch$new(selected = 1) %>>% gunion(op1, op2) %>>% PipeOpUnbranch(aggrFun = NULL) %>>% op3
FIXME: Does every PipeOp have a default method when NULL is passed?
Scenario: We want our pipeline to branch out, either in one direction or the other. This is usefull, for example when tuning over multiple learners.
We use the PipeOpChunk
operator to partition the [[Task]] into $k$ smaller [[Task]]s.
Afterwards we train $k$ learners on each sub[[Task]].
Afterwards the predictions are averaged in order to get a single prediction.
PipeOpChunk(k) %>>% pipeline_greplicate(PipeOpLearner("classif.rpart"), 10) %>>% PipeOpModelAvg()
Scenario: We want to obtain an optimal threshold in order to decide whether something is of class x or y.
We use PipeOpLearnerCV
to obtain cross-validated predictions. Afterwards we use PipeOpThreshold()
to compute an
optimal threshold.
op1 = PipeOpLearnerCV$new(""classif.rpart") op2 = PipeOpThreshold$new(method, measure, ...) g = op1 %>>% op2
Scenario: We want to transform our target variable, for example using a log-transform.
We use the PipeOpTrafoY
in order to log-transform the data.
We use another PipeOpTrafoY
after the learner in order to re-transform our data onto the original scale.
top = PipeOpTrafoY(train = log, predict = identity) retop = PipeOpTrafoY(train = identity, predict = exp) g = top %>>% PipeOpLearner("classif.svm") %>>% retop
What happens:
- g$train([[Task]]) [[trafoY(log)) >> train("classif.svm", [[Task]]) %>>% identity]] - g$predict([[Task]]) [[identity >> predict(model, [[Task]]) >> trafoPreds(exp)]]
FIXME: - Using the same operator twice would violate the acyclic property. - We can not tune over par.vals of TrafoY, as we have a hard time storing them. - User needs to ensure that trafos are correct
Scenario: We have three possible output variables we want to predict in parallel.
We set different targets before training each learner using PipeOpSetTarget
.
Afterwards the different predictions are collected with PipeOpModelAverage
.
g = gunion( PipeOpSetTarget("out1") %>>% PipeOpLearner("rpart", id = "r1"), PipeOpSetTarget("out2") %>>% PipeOpLearner("rpart", id = "r2"), PipeOpsetTarget("final_out") %>>% PipeOpLearner("rpart", id = "r3") ) %>>% PipeOpModelAverage()
Scenario: We have three possible output variables available during training, but they will not be availalbe during test time. We want to leverage info from out1 and out2 to improve prediction on final_out*
We obtain cross-validated predictions using PipeOpLearnerCV
sequentially for each target and use them to train the sequential models.
pnop = PipeOpNull() g = PipeOpSetTarget("out1") %>>% gunion(PipeOpLearnerCV("rpart", id = "r1"), pnop) %>>% PipeOpFeatureUnion() %>>% PipeOpSetTarget("out2", id = "r2") %>>% gunion(PipeOpLearnerCV("rpart"), pnop) %>>% PipeOpLearnerCV() %>>% PipeOpsetTarget("final_out") %>>% PipeOpLearner("rpart", id = "r3")
Scenario: We have a zero-inflated numeric target variable (e.g. amount unpaid bills). We want to leverage the info that most are $0$ in our model.
We obtain cross-validated predictions for whether the target variable is 0. We the use the prediction for this intermediate target for the final prediction.
pnop = PipeOpNull() # data is our dt with a numeric "target" data$target_is_null = data$target > 0 g = PipeOpSetTarget("target_is_null") %>>% gunion(PipeOpLearnerCV("rpart", id = "r1"), pnop) %>>% PipeOpFeatureUnion() %>>% PipeOpSetTarget("target") %>>% PipeOpLearner("rpart", id = "r3")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.