track_split: Split the cohort into branches by a column (CONSORT...
In autocodebook: Automatic Codebook and Tracking for 'Spark' and 'dplyr' Pipelines

track_split

R Documentation

Split the cohort into branches by a column (CONSORT flowchart)

Description

Adds one branching level to the flow tree. The cohort is divided by the distinct values of by. Chain multiple track_split() calls to create nested branches (e.g. exposure then mediator). Passes the data through unchanged, so it fits in a ⁠%>%⁠ pipeline.

Usage

track_split(sdf, by, label = NULL, value_labels = NULL, max_levels = 3L)

Arguments

`sdf`	A Spark DataFrame or local data frame.
`by`	Character. Column name to split by. Its distinct values become the branches. NA values are grouped as "(NA)".
`label`	Optional character. A human-readable name for this split level (e.g. "Exposure: drought"). Defaults to `by`.
`value_labels`	Optional named character vector mapping raw values to readable labels, e.g. `c("0" = "Sem seca", "1" = "Com seca")`. If not given, the function tries factor levels / labelled attributes on the column; failing that, uses the raw value.
`max_levels`	Integer. Safety cap on nesting depth. Default 3.

Value

sdf unchanged (for piping).

Examples

cb_init(id_col = "id_indiv")
df <- data.frame(
  id_indiv     = sprintf("ID%03d", 1:100),
  exposto_seca = sample(c(0L, 1L), 100, replace = TRUE),
  migrou       = sample(c(0L, 1L), 100, replace = TRUE),
  obito_dcv    = sample(c(0L, 1L), 100, replace = TRUE)
)
df <- track_split(df, by = "exposto_seca", label = "Exposure: drought",
                  value_labels = c("0" = "No drought", "1" = "Drought"))
df <- track_split(df, by = "migrou", label = "Mediator: migration",
                  value_labels = c("0" = "Did not migrate", "1" = "Migrated"))
track_outcomes(df, vars = "obito_dcv", labels = list(obito_dcv = "CVD death"))
flow_table()

autocodebook documentation built on June 9, 2026, 1:09 a.m.