track_split: Split the cohort into branches by a column (CONSORT...

View source: R/10_flow.R

track_splitR Documentation

Split the cohort into branches by a column (CONSORT flowchart)

Description

Adds one branching level to the flow tree. The cohort is divided by the distinct values of by. Chain multiple track_split() calls to create nested branches (e.g. exposure then mediator). Passes the data through unchanged, so it fits in a ⁠%>%⁠ pipeline.

Usage

track_split(sdf, by, label = NULL, value_labels = NULL, max_levels = 3L)

Arguments

sdf

A Spark DataFrame or local data frame.

by

Character. Column name to split by. Its distinct values become the branches. NA values are grouped as "(NA)".

label

Optional character. A human-readable name for this split level (e.g. "Exposure: drought"). Defaults to by.

value_labels

Optional named character vector mapping raw values to readable labels, e.g. c("0" = "Sem seca", "1" = "Com seca"). If not given, the function tries factor levels / labelled attributes on the column; failing that, uses the raw value.

max_levels

Integer. Safety cap on nesting depth. Default 3.

Value

sdf unchanged (for piping).

Examples

cb_init(id_col = "id_indiv")
df <- data.frame(
  id_indiv     = sprintf("ID%03d", 1:100),
  exposto_seca = sample(c(0L, 1L), 100, replace = TRUE),
  migrou       = sample(c(0L, 1L), 100, replace = TRUE),
  obito_dcv    = sample(c(0L, 1L), 100, replace = TRUE)
)
df <- track_split(df, by = "exposto_seca", label = "Exposure: drought",
                  value_labels = c("0" = "No drought", "1" = "Drought"))
df <- track_split(df, by = "migrou", label = "Mediator: migration",
                  value_labels = c("0" = "Did not migrate", "1" = "Migrated"))
track_outcomes(df, vars = "obito_dcv", labels = list(obito_dcv = "CVD death"))
flow_table()

autocodebook documentation built on June 9, 2026, 1:09 a.m.