README.md
In shard: Deterministic, Zero-Copy Parallel Execution for R

shard

Deterministic, zero-copy parallel execution for R.

shard is a parallel runtime for workloads that look like: - "run the same numeric kernel over many slices of big data" - "thousands of independent tasks over a shared dataset" - "parallel GLM / simulation / bootstrap / feature screening"

It focuses on three things that are often painful in R parallelism: 1) Shared immutable inputs (avoid duplicating large objects across workers) 2) Explicit output buffers (avoid huge result-gather lists) 3) Deterministic cleanup (supervise workers and recycle on memory drift)

From CRAN (once released):

install.packages("shard")

Development version:

# install.packages("pak")
pak::pak("bbuchsbaum/shard")

X <- shard::share(X)   # matrix/array/vector
Y <- shard::share(Y)

Shared objects are designed for zero-copy parallel reads (where the OS allows) and are treated as immutable by default inside parallel tasks.

Instead of returning giant objects from each worker, write to a preallocated buffer:

out <- shard::buffer("double", dim = c(1e6))   # example: 1M outputs

blocks <- shard::shards(1e6, block_size = "auto")

run <- shard::shard_map(
  blocks,
  borrow = list(X = X, Y = Y),
  out = list(out = out),
  workers = 8,
  fun = function(block, X, Y, out) {
    # block contains indices
    idx <- block$idx
    out[idx] <- colMeans(Y[, idx, drop = FALSE])
  }
)

shard::report(run)

By default, trying to mutate borrowed/shared inputs is treated as a bug: - cow = "deny" (default): mutation triggers an error - cow = "audit": detect and flag (best-effort; platform dependent) - cow = "allow": allow copy-on-write, track it, and enforce budgets

Why default is deny: - Prevents silent memory blowups from accidental wide writes - Prevents subtle correctness bugs (changes are private to a worker) - Keeps behavior predictable across platforms

R's GC and allocator behavior can lead to memory drift in long-running workers.

shard monitors per-worker memory usage and can recycle workers when drift exceeds thresholds, keeping end-of-run memory close to baseline.

After a run, shard can report: - total and per-worker peak RSS - end RSS vs baseline - materialized bytes (hidden copies) - recycling events, retries, timing

rep <- shard::report(run)
print(rep)

shard::mem_report(run)
shard::copy_report(run)

If your workload is “apply a function over columns” or “lapply over a list”, shard provides convenience wrappers that handle sharing and buffering automatically while still running through the supervised runtime.

X <- matrix(rnorm(1e6), nrow = 1000)

scores <- shard::shard_apply_matrix(
  X,
  MARGIN = 2,
  FUN = function(v, y) cor(v, y),
  VARS = list(y = rnorm(nrow(X))),
  workers = 8
)

xs <- lapply(1:1000, function(i) rnorm(100))

out <- shard::shard_lapply_shared(
  xs,
  FUN = function(el) mean(el),
  workers = 8
)

For large outputs (big vectors/data.frames per element), prefer buffer(), table_sink(), or shard_reduce() instead of gathering everything to the master.

MIT

Any scripts or data that you put into this service are public.

shard documentation built on April 3, 2026, 9:08 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

shard
Deterministic, Zero-Copy Parallel Execution for R

README.md
In shard: Deterministic, Zero-Copy Parallel Execution for R

shard

Installation

Core concepts

1) Share large inputs (read-only by default)

2) Allocate explicit output buffers

3) Run shard_map() over shards

Safety defaults (and why they matter)

Borrowed inputs are immutable

Deterministic cleanup via worker supervision

Diagnostics

Convenience wrappers

Column-wise apply (scalar return)

List lapply (guarded gather)

License

Try the shard package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

shard Deterministic, Zero-Copy Parallel Execution for R

README.md In shard: Deterministic, Zero-Copy Parallel Execution for R

shard

Installation

Core concepts

1) Share large inputs (read-only by default)

2) Allocate explicit output buffers

3) Run shard_map() over shards

Safety defaults (and why they matter)

Borrowed inputs are immutable

Deterministic cleanup via worker supervision

Diagnostics

Convenience wrappers

Column-wise apply (scalar return)

List lapply (guarded gather)

License

Try the shard package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

shard
Deterministic, Zero-Copy Parallel Execution for R

README.md
In shard: Deterministic, Zero-Copy Parallel Execution for R