auto_filter: Filter with automatic tracking

View source: R/02_verbs.R

auto_filterR Documentation

Filter with automatic tracking

Description

Works exactly like dplyr::filter(), but also logs a tracking step recording how many unique IDs remain after the filter.

Usage

auto_filter(
  .data,
  step = "",
  description = "",
  ...,
  cache = NULL,
  assume_unique = FALSE
)

Arguments

.data

A Spark DataFrame or local data frame.

step

Character label for this filtering step.

description

Character description of the filter.

...

Filter conditions, same syntax as dplyr::filter().

cache

Logical or NULL (named-only). If TRUE, materializes the result with cb_checkpoint() after filtering - useful in long Spark pipelines. If NULL, falls back to the session default (set via cb_init() or cb_set_default_cache()). Default: NULL.

assume_unique

Logical (named-only). Passed to track_step(). Set TRUE only when you are certain the ID column has no duplicates at this stage. Default: FALSE.

Details

The signature mirrors v0.1.0 for full backward compatibility: step and description come first (so existing positional calls keep working), then ... for the filter conditions, and finally the new big-data options (cache, assume_unique) which must be passed by name.

Value

The filtered data frame.


autocodebook documentation built on June 9, 2026, 1:09 a.m.