This package is used to temporarily relieve swelling, burning, pain, and itching caused by data preparation. Heavily influenced by sklearn preprocessing module. As such it aims to implement the Transformer API and allow for pipelines that can be saved and applied to new datasets.
Yes, and it’s pretty comprehensive. Check out the recipes
package
here:
https://tidymodels.github.io/recipes/
. So why reinvent the wheel? Well I am not a huge fan of the tidyverse.
I like that it turns new users on to R and the folks at RStudio have
done so much for the R community. The tidyverse is very opinionated and
still evolving. I prefer to stick to base R when I can and I especially
like understanding how things work under the hood. Hence this package.
Processing pipelines are nothing new. So it’s no suprise that this
package follows a similar approach. You can create a pipeline explicitly
using the pipeline
function or in a maggritr
style by using the
pipeline operator, %|>%
, to pipe multiple prep functions into each
other.
data(iris)
p1 <- pipeline(
prep_minmax(~.-Species),
prep_onehot(~sel_factor()),
sink_matrix()
)
p2 <-
prep_minmax(~.-Species) %|>%
prep_onehot(~sel_factor()) %|>%
sink_matrix()
all.equal(p1, p2)
## [1] TRUE
## print out
p1
## [ Pipeline ] [isfit: no ]
## |--[ MinMaxScaler ] [isfit: no ]
## |--[ OnehotEncoder ] [isfit: no ]
## |--[ Sink ] [isfit: no ]
The purpose of creating these pipelines is to fit them to data and save
them to apply on different datasets. The fit method is used to fit a
pipeline. It works by fitting each transform in sequence and passing the
transformed data down the pipe. Once it has been trained, the isfit
member will be set to TRUE
p1$fit(iris)
p1
## [ Pipeline ] [isfit: yes ]
## |--[ MinMaxScaler ] [isfit: yes ]
## |--[ OnehotEncoder ] [isfit: yes ]
## |--[ Sink ] [isfit: yes ]
Once a pipeline has been fit, the transform method can be called and passed a new dataset. The settings saved during the training process will be applied to the new dataset ensuring a reproducible workflow with little micromanagement.
z <- p1$transform(iris)
knitr::kable(head(z), digits = 2)
Sepal.Length
Sepal.Width
Petal.Length
Petal.Width
Species=setosa
Species=versicolor
Species=virginica
-0.56
0.25
-0.86
-0.92
1
0
0
-0.67
-0.17
-0.86
-0.92
1
0
0
-0.78
0.00
-0.90
-0.92
1
0
0
-0.83
-0.08
-0.83
-0.92
1
0
0
-0.61
0.33
-0.86
-0.92
1
0
0
-0.39
0.58
-0.76
-0.75
1
0
0
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.