Deterministic, zero-copy parallel execution for R.
shard is a parallel runtime for workloads that look like:
- "run the same numeric kernel over many slices of big data"
- "thousands of independent tasks over a shared dataset"
- "parallel GLM / simulation / bootstrap / feature screening"
It focuses on three things that are often painful in R parallelism: 1) Shared immutable inputs (avoid duplicating large objects across workers) 2) Explicit output buffers (avoid huge result-gather lists) 3) Deterministic cleanup (supervise workers and recycle on memory drift)
From CRAN (once released):
install.packages("shard")
Development version:
# install.packages("pak")
pak::pak("bbuchsbaum/shard")
X <- shard::share(X) # matrix/array/vector
Y <- shard::share(Y)
Shared objects are designed for zero-copy parallel reads (where the OS allows) and are treated as immutable by default inside parallel tasks.
Instead of returning giant objects from each worker, write to a preallocated buffer:
out <- shard::buffer("double", dim = c(1e6)) # example: 1M outputs
blocks <- shard::shards(1e6, block_size = "auto")
run <- shard::shard_map(
blocks,
borrow = list(X = X, Y = Y),
out = list(out = out),
workers = 8,
fun = function(block, X, Y, out) {
# block contains indices
idx <- block$idx
out[idx] <- colMeans(Y[, idx, drop = FALSE])
}
)
shard::report(run)
By default, trying to mutate borrowed/shared inputs is treated as a bug:
- cow = "deny" (default): mutation triggers an error
- cow = "audit": detect and flag (best-effort; platform dependent)
- cow = "allow": allow copy-on-write, track it, and enforce budgets
Why default is deny: - Prevents silent memory blowups from accidental wide writes - Prevents subtle correctness bugs (changes are private to a worker) - Keeps behavior predictable across platforms
R's GC and allocator behavior can lead to memory drift in long-running workers.
shard monitors per-worker memory usage and can recycle workers when drift
exceeds thresholds, keeping end-of-run memory close to baseline.
After a run, shard can report:
- total and per-worker peak RSS
- end RSS vs baseline
- materialized bytes (hidden copies)
- recycling events, retries, timing
rep <- shard::report(run)
print(rep)
shard::mem_report(run)
shard::copy_report(run)
If your workload is “apply a function over columns” or “lapply over a list”,
shard provides convenience wrappers that handle sharing and buffering
automatically while still running through the supervised runtime.
X <- matrix(rnorm(1e6), nrow = 1000)
scores <- shard::shard_apply_matrix(
X,
MARGIN = 2,
FUN = function(v, y) cor(v, y),
VARS = list(y = rnorm(nrow(X))),
workers = 8
)
xs <- lapply(1:1000, function(i) rnorm(100))
out <- shard::shard_lapply_shared(
xs,
FUN = function(el) mean(el),
workers = 8
)
For large outputs (big vectors/data.frames per element), prefer buffer(), table_sink(),
or shard_reduce() instead of gathering everything to the master.
MIT
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.