auto_filter: Filter with automatic tracking
In autocodebook: Automatic Codebook and Tracking for 'Spark' and 'dplyr' Pipelines

auto_filter

R Documentation

Filter with automatic tracking

Description

Works exactly like dplyr::filter(), but also logs a tracking step recording how many unique IDs remain after the filter.

Usage

auto_filter(
  .data,
  step = "",
  description = "",
  ...,
  cache = NULL,
  assume_unique = FALSE
)

Arguments

`.data`	A Spark DataFrame or local data frame.
`step`	Character label for this filtering step.
`description`	Character description of the filter.
`...`	Filter conditions, same syntax as `dplyr::filter()`.
`cache`	Logical or NULL (named-only). If TRUE, materializes the result with `cb_checkpoint()` after filtering - useful in long Spark pipelines. If NULL, falls back to the session default (set via `cb_init()` or `cb_set_default_cache()`). Default: NULL.
`assume_unique`	Logical (named-only). Passed to `track_step()`. Set TRUE only when you are certain the ID column has no duplicates at this stage. Default: FALSE.

Details

The signature mirrors v0.1.0 for full backward compatibility: step and description come first (so existing positional calls keep working), then ... for the filter conditions, and finally the new big-data options (cache, assume_unique) which must be passed by name.