require(pipeflow) knitr::opts_chunk$set( comment = "#", prompt = FALSE, tidy = FALSE, cache = FALSE, collapse = TRUE ) old <- options(width = 100L) library(ggplot2)
pip <- Pipeline$new("my-pipeline", data = airquality) pip$add( "data_prep", function(data = ~data) { replace(data, "Temp.Celsius", (data[, "Temp"] - 32) * 5/9) } ) pip$add( "model_fit", function( data = ~`data_prep`, xVar = "Temp.Celsius" ) { lm(paste("Ozone ~", xVar), data = data) } ) pip$add( "model_plot", function( model = ~`model_fit`, data = ~`data_prep`, xVar = "Temp.Celsius", title = "Linear model fit" ) { coeffs <- coefficients(model) ggplot(data) + geom_point(aes(.data[[xVar]], .data[["Ozone"]])) + geom_abline(intercept = coeffs[1], slope = coeffs[2]) + labs(title = title) } ) pip$set_params(list(xVar = "Solar.R")) pip$set_params(list(title = "Some new title")) pip$set_data(airquality[1:10, ]) pip$run()
Let's start where we left off in the Get started with pipeflow vignette, that is, we have the following pipeline
pip
with the following set data
pip$get_data() |> head(3)
Let's say we want to insert a new step after the data_prep
step
that standardizes the y-variable.
pip$insert_after( afterStep = "data_prep", step = "standardize", function( data = ~`data_prep`, yVar = "Ozone" ) { data[, yVar] <- scale(data[, yVar]) data } )
pip
library(visNetwork) do.call(visNetwork, args = pip$get_graph()) |> visHierarchicalLayout(direction = "LR", sortMethod = "directed")
library(visNetwork) do.call(visNetwork, args = c(pip$get_graph(), list(height = 300))) |> visHierarchicalLayout(direction = "LR", sortMethod = "directed")
As we can see, the standardize
step is now part of the pipeline, but
so far it is not used by any other step.
Let's revisit the function definition of the model_fit
step
pip$get_step("model_fit")[["fun"]]
To use the standardized data, we need to change the data dependency
such that it refers to the standardize
step. Also instead of
a fixed y-variable in the model, we want to pass it as a paramter.
pip$replace_step( "model_fit", function( data = ~standardize, # <- changed data reference xVar = "Temp.Celsius", yVar = "Ozone" # <- new y-variable ) { lm(paste(yVar, "~", xVar), data = data) } )
The model_plot
step needs to be updated in a similar way.
pip$replace_step( "model_plot", function( model = ~model_fit, data = ~standardize, # <- changed data reference xVar = "Temp.Celsius", yVar = "Ozone", # <- new y-variable title = "Linear model fit" ) { coeffs <- coefficients(model) ggplot(data) + geom_point(aes(.data[[xVar]], .data[[yVar]])) + geom_abline(intercept = coeffs[1], slope = coeffs[2]) + labs(title = title) } )
The updated pipeline now looks as follows.
pip
library(visNetwork) do.call(visNetwork, args = c(pip$get_graph(), list(height = 100))) |> visHierarchicalLayout(direction = "LR")
We see that the model_fit
and model_plot
steps now use the standardized data.
Let's re-run the pipeline and inspect the output.
pip$set_params(list(xVar = "Solar.R", yVar = "Wind")) pip$run()
pip$get_out("model_fit") |> coefficients()
pip$get_out("model_plot")
Let's see the pipeline again.
pip
When you are trying to remove a step, pipeflow
by default checks if
the step is used by any other step, and raises an error if removing the
step would violate the integrity of the pipeline.
try(pip$remove_step("standardize"))
To enforce removing a step together with all its downstream
dependencies, you can use the recursive
argument.
pip$remove_step("standardize", recursive = TRUE)
pip
Naturally, the last step never has any downstream dependencies, so it can be removed without any issues. There is another way to just remove the last step.
pip$pop_step()
pip
Replacing steps in a pipeline as shown in this vignette will allow to re-use existing pipelines and adapt them programmatically to new requirements. Another way of re-using pipelines is to combine them, which is shown in the Combining pipelines vignette.
options(old)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.