cb_checkpoint: Checkpoint a Spark DataFrame
In autocodebook: Automatic Codebook and Tracking for 'Spark' and 'dplyr' Pipelines

cb_checkpoint

R Documentation

Checkpoint a Spark DataFrame

Description

Forces materialization of a lazy Spark plan. Useful in long pipelines where query plans get too deep and the optimizer starts re-computing upstream steps. For local data frames, this is a no-op.

Usage

cb_checkpoint(sdf, name = NULL, mode = c("memory", "disk", "register"))

Arguments

`sdf`	A Spark DataFrame (tbl_spark) or local data frame.
`name`	Optional. Name to register the checkpoint under (Spark only). If NULL, a temporary name is generated.
`mode`	Character. One of `"memory"` (cache in memory, fastest), `"disk"` (sdf_checkpoint via disk, more durable), or `"register"` (just register as temp table without caching). Default: `"memory"`.