knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette introduces rxp_pipeline(), a function for organising large
projects into logical sub-pipelines. This feature is particularly useful when
working on complex projects with multiple phases (e.g., ETL, Modelling, Reporting)
or when collaborating in teams where different members work on different parts
of the pipeline.
As pipelines grow, a single gen-pipeline.R file can become difficult to
manage. Consider a data science project with:
- Data extraction and cleaning (ETL)
- Feature engineering
- Model training
- Model evaluation
- Report generation
Putting all derivations in one file makes it hard to:
To solve this issue, you can define your project using sub-pipelines and join
them into a master pipeline using rxp_pipeline().
This allows you to:
A project with sub-pipelines would look something like this:
my-project/
├── default.nix # Nix environment (generated by rix)
├── gen-env.R # Script to generate default.nix
├── gen-pipeline.R # MASTER SCRIPT: combines all sub-pipelines
└── pipelines/
├── 01_data_prep.R # Data preparation sub-pipeline
├── 02_analysis.R # Analysis sub-pipeline
└── 03_reporting.R # Reporting sub-pipeline
Each sub-pipeline file returns a list of derivations:
# Data Preparation Sub-Pipeline # pipelines/01_data_prep.R library(rixpress) list( rxp_r(name = raw_mtcars, expr = mtcars), rxp_r(name = clean_mtcars, expr = dplyr::filter(raw_mtcars, am == 1)), rxp_r(name = selected_mtcars, expr = dplyr::select(clean_mtcars, mpg, cyl, hp, wt)) )
The rxp_pipeline() function takes:
The second sub-pipeline:
# Analysis Sub-Pipeline # pipelines/02_analysis.R library(rixpress) list( rxp_r(name = summary_stats, expr = summary(selected_mtcars)), rxp_r(name = mpg_model, expr = lm(mpg ~ hp + wt, data = selected_mtcars)), rxp_r(name = model_coefs, expr = coef(mpg_model)) )
The master script becomes very clean, as rxp_pipeline handles sourcing the files:
# gen-pipeline.R library(rixpress) # Create named pipelines with colours by pointing to the files pipe_data_prep <- rxp_pipeline( name = "Data Preparation", path = "pipelines/01_data_prep.R", color = "#E69F00" ) pipe_analysis <- rxp_pipeline( name = "Statistical Analysis", path = "pipelines/02_analysis.R", color = "#56B4E9" ) # Build combined pipeline rxp_populate(list(pipe_data_prep, pipe_analysis), project_path = ".", build = TRUE)
When sub-pipelines are defined, visualisation tools use pipeline colours:
rxp_visnetwork()) and Static DAG (rxp_ggdag()) both use a dual-encoding approach:
rxp_trace() output in the console is coloured by pipeline (using the cli package).
# Dual encoding: fill = type, border = pipeline (default when pipelines are defined) rxp_ggdag(color_by = "pipeline") # Colour entirely by derivation type (rxp_r, rxp_py, etc.) - original behaviour rxp_ggdag(color_by = "type")
When you call rxp_populate() with rxp_pipeline objects:
pipeline_group and pipeline_colordag.json includes pipeline metadatarxp_visnetwork() and rxp_ggdag() read this metadatarxp_pipeline() provides a simple yet powerful way to organise complex
pipelines. By grouping derivations into logical units, you can:
For a working example, see the subpipelines demo in the
rixpress_demos repository.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.