group_map: Apply a function to each shard of a partition
In vectra: Columnar Query Engine for Larger-than-RAM Data

group_map

R Documentation

Apply a function to each shard of a partition

Description

Run a function once per shard of a partition (offload(x, by = ...)) and gather the results. Each shard is read into memory as a data.frame and passed to .f together with its key, so a model that couples rows within a group becomes a set of independent per-shard fits. This is the per-group counterpart to collect_chunked(), which instead merges every shard into a single accumulator.

Usage

group_map(.data, .f, ...)

## S3 method for class 'vectra_partition'
group_map(.data, .f, ...)

group_modify(.data, .f, ...)

## S3 method for class 'vectra_partition'
group_modify(.data, .f, ...)

Arguments

`.data`	A `vectra_partition` from `offload()` with a `by` key.
`.f`	A function applied to each shard. It receives the shard as a data.frame and the shard key (a string) as its first two arguments; any further arguments in `...` follow. A purrr-style formula such as `~ lm(y ~ x, .x)` also works, with `.x` the shard data and `.y` the key. For `group_modify()`, `.f` must return a data.frame.
`...`	Additional arguments passed on to `.f`.

Details

group_map() returns a named list, one element per shard keyed by the shard key, and places no constraint on what .f returns. Use it for per-group results that do not rebind into a table, such as fitted models.

group_modify() expects .f to return a data.frame for each shard and binds those frames into one. When a shard's result does not already carry the partition key column, the key is added as a leading column (named after the partition's by), so every row records the shard it came from. Use it for per-group summaries that recombine into a single table.

Each shard is materialized in full before .f sees it, so partition the query on a key whose groups fit in memory. For a reduction that stays bounded without ever holding a whole group, fold the partition with collect_chunked() instead.

Value

group_map() returns a named list with one element per shard. group_modify() returns a single data.frame: the per-shard results row-bound, with the shard key restored as a column when .f dropped it.

Examples

f <- tempfile(fileext = ".vtr")
write_vtr(mtcars, f)
p <- offload(tbl(f), by = "cyl")

# One fit per shard, returned as a named list keyed by cyl.
fits <- group_map(p, function(d, cyl) coef(lm(mpg ~ wt, data = d)))
fits

# Per-shard summaries recombined into one table, key restored as a column.
group_modify(p, function(d, cyl)
  data.frame(n = nrow(d), mean_mpg = mean(d$mpg)))
unlink(f)

vectra documentation built on July 10, 2026, 5:08 p.m.