shard_crossprod: Parallel crossprod() using shard views + output buffers
In shard: Deterministic, Zero-Copy Parallel Execution for R

shard_crossprod

R Documentation

Parallel crossprod() using shard views + output buffers

Description

Computes crossprod(X, Y) (i.e. t(X) %*% Y) using:

shared/mmap-backed inputs (one copy),
block views (no slice materialization),
BLAS-3 dgemm in each tile,
an explicit shared output buffer (no gather/bind spikes).

Usage

shard_crossprod(
  X,
  Y,
  workers = NULL,
  block_x = "auto",
  block_y = "auto",
  backing = c("mmap", "shm"),
  materialize = c("auto", "never", "always"),
  materialize_max_bytes = 512 * 1024^2,
  diagnostics = TRUE
)

Arguments

`X`, `Y`	Double matrices with the same number of rows.
`workers`	Number of worker processes.
`block_x`, `block_y`	Tile sizes over `ncol(X)` and `ncol(Y)`. Use `"auto"` (default) to autotune on the current machine.
`backing`	Backing for shared inputs and output buffer (`"mmap"` or `"shm"`).
`materialize`	Whether to return the result as a standard R matrix: `"never"` (return buffer handle), `"always"`, or `"auto"` (materialize if estimated output size is below `materialize_max_bytes`).
`materialize_max_bytes`	Threshold for `"auto"` materialization.
`diagnostics`	Whether to collect shard_map diagnostics.

Details

This is intended as an ergonomic entry point for the "wow" path: users shouldn't have to manually call share(), view_block(), buffer(), tiles2d(), and shard_map() for common patterns.

Value

A list with:

buffer: shard_buffer for the result (p x v)
value: materialized matrix if requested, otherwise NULL
run: the underlying shard_result from shard_map
tile: chosen tile sizes

Examples


X <- matrix(rnorm(2000), 100, 20)
Y <- matrix(rnorm(2000), 100, 20)
res <- shard_crossprod(X, Y, block_x = 50, block_y = 10, workers = 2)
pool_stop()
res$value

shard documentation built on April 6, 2026, 1:07 a.m.