Dataflow Programming for Machine Learning in R.
Watch our “WhyR 2020” Webinar Presentation on Youtube for an introduction! Find the slides here.
mlr3pipelines is a dataflow
for machine learning in R utilising the
mlr3 package. Machine learning
workflows can be written as directed “Graphs” that represent data flows
between preprocessing, model fitting, and ensemble learning units in an
expressive and intuitive language. Using methods from the
mlr3tuning package, it is
even possible to simultaneously optimize parameters of multiple
In principle, mlr3pipelines is about defining singular data and model manipulation steps as “PipeOps”:
pca = po("pca") filter = po("filter", filter = mlr3filters::flt("variance"), filter.frac = 0.5) learner_po = po("learner", learner = lrn("classif.rpart"))
These pipeops can then be combined together to define machine learning
pipelines. These can be wrapped in a
GraphLearner that behave like any
graph = pca %>>% filter %>>% learner_po glrn = GraphLearner$new(graph)
This learner can be used for resampling, benchmarking, and even tuning.
resample(tsk("iris"), glrn, rsmp("cv")) #> <ResampleResult> of 10 iterations #> * Task: iris #> * Learner: pca.variance.classif.rpart #> * Warnings: 0 in 0 iterations #> * Errors: 0 in 0 iterations
Single computational steps can be represented as so-called PipeOps, which can then be connected with directed edges in a Graph. The scope of mlr3pipelines is still growing; currently supported features are:
The easiest way to get started is reading some of the vignettes that are shipped with the package, which can also be viewed online:
mlr3pipelines is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!
In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).
Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.
A predecessor to this package is the mlrCPO-package, which works with mlr 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are the caret package and the related recipes project, and the dplyr package.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.