cb_checkpoint: Checkpoint a Spark DataFrame

View source: R/04_tracking.R

cb_checkpointR Documentation

Checkpoint a Spark DataFrame

Description

Forces materialization of a lazy Spark plan. Useful in long pipelines where query plans get too deep and the optimizer starts re-computing upstream steps. For local data frames, this is a no-op.

Usage

cb_checkpoint(sdf, name = NULL, mode = c("memory", "disk", "register"))

Arguments

sdf

A Spark DataFrame (tbl_spark) or local data frame.

name

Optional. Name to register the checkpoint under (Spark only). If NULL, a temporary name is generated.

mode

Character. One of "memory" (cache in memory, fastest), "disk" (sdf_checkpoint via disk, more durable), or "register" (just register as temp table without caching). Default: "memory".

Value

The (possibly materialized) data frame.


autocodebook documentation built on June 9, 2026, 1:09 a.m.