civic_split: Reproducible train/test split

View source: R/civic_data_utils.R

civic_splitR Documentation

Reproducible train/test split

Description

Splits a data frame into training and test sets. The seed is always stored in the returned object so the split is fully reproducible. Optional stratification preserves class proportions.

Usage

civic_split(data, prop = 0.75, seed = 2025L, stratify = NULL)

Arguments

data

A 'data.frame' or 'tibble'.

prop

Proportion for training (default '0.75').

seed

Integer random seed (default '2025').

stratify

Optional column name (character) to stratify on. Ensures class proportions are preserved in both splits. Works for both factor (classification) and numeric targets (stratifies by quartile).

Value

A named list with elements 'train', 'test', 'seed', and 'prop'.

Examples

# Any data frame works
splits <- civic_split(iris, prop = 0.8, stratify = "Species")
nrow(splits$train)  # ~120
nrow(splits$test)   # ~30

# Numeric stratification (by quartile)
splits2 <- civic_split(mtcars, prop = 0.75, stratify = "mpg")

civic.icarm documentation built on June 18, 2026, 1:06 a.m.