library(prepr) knitr::opts_chunk$set(echo = TRUE)
This package is used to temporarily relieve swelling, burning, pain, and itching caused by data preparation. Heavily influenced by sklearn preprocessing module. As such it aims to implement the Transformer API and allow for pipelines that can be saved and applied to new datasets.
Yes, and it's pretty comprehensive. Check out the recipes
package here: https://tidymodels.github.io/recipes/ . So why reinvent the wheel? Well I am not a huge fan of the tidyverse. I like that it turns new users on to R and the folks at RStudio have done so much for the R community. The tidyverse is very opinionated and still evolving. I prefer to stick to base R when I can and I especially like understanding how things work under the hood. Hence this package.
Processing pipelines are nothing new. So it's no suprise that this package follows a similar approach. You can create a pipeline explicitly using the pipeline
function or in a maggritr
style by using the pipeline operator, %|>%
, to pipe multiple prep functions into each other.
data(iris) p1 <- pipeline( prep_minmax(~.-Species), prep_onehot(~sel_factor()), sink_matrix() ) p2 <- prep_minmax(~.-Species) %|>% prep_onehot(~sel_factor()) %|>% sink_matrix() all.equal(p1, p2) ## print out p1
The purpose of creating these pipelines is to fit them to data and save them to apply on different datasets. The fit method is used to fit a pipeline. It works by fitting each transform in sequence and passing the transformed data down the pipe. Once it has been trained, the isfit
member will be set to TRUE
p1$fit(iris) p1
Once a pipeline has been fit, the transform method can be called and passed a new dataset. The settings saved during the training process will be applied to the new dataset ensuring a reproducible workflow with little micromanagement.
z <- p1$transform(iris) knitr::kable(head(z), digits = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.