Package website: release \| dev

Dataflow Programming for Machine Learning in R.

`mlr3pipelines`

?Watch our “WhyR 2020” Webinar Presentation on Youtube for an introduction! Find the slides here.

** mlr3pipelines** is a dataflow
programming toolkit
for machine learning in R utilising the

In principle, *mlr3pipelines* is about defining singular data and model
manipulation steps as “PipeOps”:

```
pca = po("pca")
filter = po("filter", filter = mlr3filters::flt("variance"), filter.frac = 0.5)
learner_po = po("learner", learner = lrn("classif.rpart"))
```

These pipeops can then be combined together to define machine learning
pipelines. These can be wrapped in a `GraphLearner`

that behave like any
other `Learner`

in `mlr3`

.

```
graph = pca %>>% filter %>>% learner_po
glrn = GraphLearner$new(graph)
```

This learner can be used for resampling, benchmarking, and even tuning.

```
resample(tsk("iris"), glrn, rsmp("cv"))
#> <ResampleResult> with 10 resampling iterations
#> task_id learner_id resampling_id iteration warnings errors
#> iris pca.variance.classif.rpart cv 1 0 0
#> iris pca.variance.classif.rpart cv 2 0 0
#> iris pca.variance.classif.rpart cv 3 0 0
#> iris pca.variance.classif.rpart cv 4 0 0
#> iris pca.variance.classif.rpart cv 5 0 0
#> iris pca.variance.classif.rpart cv 6 0 0
#> iris pca.variance.classif.rpart cv 7 0 0
#> iris pca.variance.classif.rpart cv 8 0 0
#> iris pca.variance.classif.rpart cv 9 0 0
#> iris pca.variance.classif.rpart cv 10 0 0
```

Single computational steps can be represented as so-called **PipeOps**,
which can then be connected with directed edges in a **Graph**. The
scope of *mlr3pipelines* is still growing; currently supported features
are:

- Simple data manipulation and preprocessing operations, e.g. PCA, feature filtering
- Task subsampling for speed and outcome class imbalance handling
*mlr3**Learner*operations for prediction and stacking- Simultaneous path branching (data going both ways)
- Alternative path branching (data going one specific way, controlled by hyperparameters)
- Ensemble methods and aggregation of predictions

A good way to get into `mlr3pipelines`

are the following two vignettes:

*mlr3pipelines* is a free and open source software project that
encourages participation and feedback. If you have any issues,
questions, suggestions or feedback, please do not hesitate to open an
“issue” about it on the GitHub page!

In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).

Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.

If you use mlr3pipelines, please cite our JMLR article:

```
@Article{mlr3pipelines,
title = {{mlr3pipelines} - Flexible Machine Learning Pipelines in R},
author = {Martin Binder and Florian Pfisterer and Michel Lang and Lennart Schneider and Lars Kotthoff and Bernd Bischl},
journal = {Journal of Machine Learning Research},
year = {2021},
volume = {22},
number = {184},
pages = {1-7},
url = {https://jmlr.org/papers/v22/21-0281.html},
}
```

A predecessor to this package is the
*mlrCPO*-package, which works with
*mlr* 2.x. Other packages that provide, to varying degree, some
preprocessing functionality or machine learning domain specific
language, are the *caret* package and
the related *recipes* project, and
the *dplyr* package.

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.