pipe_add: Add pipeline step
In pipeflow: Lightweight, General-Purpose Data Analysis Pipelines

pipe_add

R Documentation

Add pipeline step

Description

A pipeline consists of a series of steps, which usually are added one by one. Each step is made up of a function computing something once the pipeline is run. This function can be an existing R function (e.g. mean()) or an anonymous/lambda function specifically defined for the pipeline. One useful feature is that function parameters can refer to results of earlier pipeline steps using the syntax x = ~earlier_step_name - see the Examples for more details.

Usage

pipe_add(
  pip,
  step,
  fun,
  params = list(),
  description = "",
  group = step,
  keepOut = FALSE
)

Arguments

`pip`	`Pipeline` object
`step`	`string` the name of the step. Each step name must be unique.
`fun`	`function` or name of the function to be applied at the step. Both existing and anonymous/lambda functions can be used. All function parameters must have default values. If a parameter is missing a default value in the function signature, alternatively, it can be set via the `params` argument (see Examples section with `mean()` function).
`params`	`list` list of parameters to set or overwrite parameters of the passed function.
`description`	`string` optional description of the step
`group`	`string` output collected after pipeline execution (see `pipe_collect_out()` is grouped by the defined group names. By default, this is the name of the step, which comes in handy when the pipeline is copy-appended multiple times to keep the results of the same function/step grouped at one place.
`keepOut`	`logical` if `FALSE` (default) the output of the step is not collected when calling `pipe_collect_out()` after the pipeline run. This option is used to only keep the results that matter and skip intermediate results that are not needed. See also function `pipe_collect_out()` for more details.

Value

returns the Pipeline object invisibly

Examples

# Add steps with lambda functions
p <- pipe_new("myPipe", data = 1)
pipe_add(p, "s1", \(x = ~data) 2*x)  # use input data
pipe_add(p, "s2", \(x = ~data, y = ~s1) x * y)
try(pipe_add(p, "s2", \(z = 3) 3)) # error: step 's2' exists already
try(pipe_add(p, "s3", \(z = ~foo) 3)) # dependency 'foo' not found
p

# Add step with existing function
p <- pipe_new("myPipe", data = c(1, 2, NA, 3, 4))
try(pipe_add(p, "calc_mean", mean))  # default value for x is missing
pipe_add(p, "calc_mean", mean, params = list(x = ~data, na.rm = TRUE))
p |> pipe_run() |> pipe_get_out("calc_mean")

# Step description
p <- pipe_new("myPipe", data = 1:10)
pipe_add(p, "s1", \(x = ~data) 2*x, description = "multiply by 2")
print(p, verbose = TRUE) # print all columns including description


# Group output
p <- pipe_new("myPipe", data = data.frame(x = 1:2, y = 3:4))
pipe_add(p, "prep_x", \(data = ~data) data$x, group = "prep")
pipe_add(p, "prep_y", \(data = ~data) (data$y)^2, group = "prep")
pipe_add(p, "sum", \(x = ~prep_x, y = ~prep_y) x + y)
p |> pipe_run() |> pipe_collect_out(all = TRUE)

pipeflow documentation built on April 3, 2025, 10:50 p.m.