Modifying existing pipelines"

require(pipeflow)

knitr::opts_chunk$set(
  comment = "#",
  prompt = FALSE,
  tidy = FALSE,
  cache = FALSE,
  collapse = TRUE
)

old <- options(width = 100L)
library(ggplot2)

Existing pipeline

pip <- Pipeline$new("my-pipeline", data = airquality)
pip$add(
    "data_prep",
    function(data = ~data) {
        replace(data, "Temp.Celsius", (data[, "Temp"] - 32) * 5/9)
    }
)
pip$add(
    "model_fit",
    function(
        data = ~`data_prep`,
        xVar = "Temp.Celsius"
    ) {
        lm(paste("Ozone ~", xVar), data = data)
    }
)
pip$add(
    "model_plot",
    function(
        model = ~`model_fit`,
        data = ~`data_prep`,
        xVar = "Temp.Celsius",
        title = "Linear model fit"
    ) {
        coeffs <- coefficients(model)
        ggplot(data) +
            geom_point(aes(.data[[xVar]], .data[["Ozone"]])) +
            geom_abline(intercept = coeffs[1], slope = coeffs[2]) +
            labs(title = title)
    }
)
pip$set_params(list(xVar = "Solar.R"))
pip$set_params(list(title = "Some new title"))
pip$set_data(airquality[1:10, ])
pip$run()

Let's start where we left off in the Get started with pipeflow vignette, that is, we have the following pipeline

pip

with the following set data

pip$get_data() |> head(3)

Insert new step

Let's say we want to insert a new step after the data_prep step that standardizes the y-variable.

pip$insert_after(
    afterStep = "data_prep",
    step = "standardize",
    function(
        data = ~`data_prep`,
        yVar = "Ozone"
    ) {
        data[, yVar] <- scale(data[, yVar])
        data
    }
)
pip
library(visNetwork)
do.call(visNetwork, args = pip$get_graph()) |>
    visHierarchicalLayout(direction = "LR", sortMethod = "directed")
library(visNetwork)
do.call(visNetwork, args = c(pip$get_graph(), list(height = 300))) |>
    visHierarchicalLayout(direction = "LR", sortMethod = "directed")

As we can see, the standardize step is now part of the pipeline, but so far it is not used by any other step.

Replace existing steps

Let's revisit the function definition of the model_fit step

pip$get_step("model_fit")[["fun"]]

To use the standardized data, we need to change the data dependency such that it refers to the standardize step. Also instead of a fixed y-variable in the model, we want to pass it as a paramter.

pip$replace_step(
    "model_fit",
    function(
        data = ~standardize,        # <- changed data reference
        xVar = "Temp.Celsius",
        yVar = "Ozone"              # <- new y-variable
    ) {
        lm(paste(yVar, "~", xVar), data = data)
    }
)

The model_plot step needs to be updated in a similar way.

pip$replace_step(
    "model_plot",
    function(
        model = ~model_fit,
        data = ~standardize,         # <- changed data reference
        xVar = "Temp.Celsius",
        yVar = "Ozone",              # <- new y-variable
        title = "Linear model fit"
    ) {
        coeffs <- coefficients(model)
        ggplot(data) +
            geom_point(aes(.data[[xVar]], .data[[yVar]])) +
            geom_abline(intercept = coeffs[1], slope = coeffs[2]) +
            labs(title = title)
    }
)

The updated pipeline now looks as follows.

pip
library(visNetwork)
do.call(visNetwork, args = c(pip$get_graph(), list(height = 100))) |>
    visHierarchicalLayout(direction = "LR")

We see that the model_fit and model_plot steps now use the standardized data. Let's re-run the pipeline and inspect the output.

pip$set_params(list(xVar = "Solar.R", yVar = "Wind"))
pip$run()
pip$get_out("model_fit") |> coefficients()
pip$get_out("model_plot")

Removing steps

Let's see the pipeline again.

pip

When you are trying to remove a step, pipeflow by default checks if the step is used by any other step, and raises an error if removing the step would violate the integrity of the pipeline.

try(pip$remove_step("standardize"))

To enforce removing a step together with all its downstream dependencies, you can use the recursive argument.

pip$remove_step("standardize", recursive = TRUE)
pip

Naturally, the last step never has any downstream dependencies, so it can be removed without any issues. There is another way to just remove the last step.

pip$pop_step()
pip

Replacing steps in a pipeline as shown in this vignette will allow to re-use existing pipelines and adapt them programmatically to new requirements. Another way of re-using pipelines is to combine them, which is shown in the Combining pipelines vignette.

options(old)


Try the pipeflow package in your browser

Any scripts or data that you put into this service are public.

pipeflow documentation built on April 3, 2025, 10:50 p.m.