In Zelazny7/prepr: Data Preparation Pipeline and Manifest

library(prepr)
knitr::opts_chunk$set(echo = TRUE)

Preparation R

This package is used to temporarily relieve swelling, burning, pain, and itching caused by data preparation. Heavily influenced by sklearn preprocessing module. As such it aims to implement the Transformer API and allow for pipelines that can be saved and applied to new datasets.

Isn't there already a package that does this?

Yes, and it's pretty comprehensive. Check out the recipes package here: https://tidymodels.github.io/recipes/ . So why reinvent the wheel? Well I am not a huge fan of the tidyverse. I like that it turns new users on to R and the folks at RStudio have done so much for the R community. The tidyverse is very opinionated and still evolving. I prefer to stick to base R when I can and I especially like understanding how things work under the hood. Hence this package.

Prep Functions

Processing pipelines are nothing new. So it's no suprise that this package follows a similar approach. You can create a pipeline explicitly using the pipeline function or in a maggritr style by using the pipeline operator, %|>%, to pipe multiple prep functions into each other.

data(iris)

p1 <- pipeline(
  prep_minmax(~.-Species),
  prep_onehot(~sel_factor()),
  sink_matrix()
)

p2 <-
  prep_minmax(~.-Species) %|>%
  prep_onehot(~sel_factor()) %|>%
  sink_matrix()

all.equal(p1, p2)

## print out
p1

Fitting

The purpose of creating these pipelines is to fit them to data and save them to apply on different datasets. The fit method is used to fit a pipeline. It works by fitting each transform in sequence and passing the transformed data down the pipe. Once it has been trained, the isfit member will be set to TRUE

p1$fit(iris)
p1

Transforming

Once a pipeline has been fit, the transform method can be called and passed a new dataset. The settings saved during the training process will be applied to the new dataset ensuring a reproducible workflow with little micromanagement.

z <- p1$transform(iris)
knitr::kable(head(z), digits = 2)

Zelazny7/prepr documentation built on May 6, 2019, 7:02 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Zelazny7/prepr
Data Preparation Pipeline and Manifest

In Zelazny7/prepr: Data Preparation Pipeline and Manifest

Preparation R

Isn't there already a package that does this?

Prep Functions

Fitting

Transforming

R Package Documentation

Browse R Packages

We want your feedback!

Zelazny7/prepr Data Preparation Pipeline and Manifest

In Zelazny7/prepr: Data Preparation Pipeline and Manifest

Preparation R

Isn't there already a package that does this?

Prep Functions

Fitting

Transforming

R Package Documentation

Browse R Packages

We want your feedback!

Zelazny7/prepr
Data Preparation Pipeline and Manifest