| BigDataStatMeth | R Documentation |
BigDataStatMeth provides statistical and linear algebra operations for matrices stored in HDF5 files. The package is designed for workflows in which matrices may be too large to be held entirely in memory, while still allowing users to work with familiar R functions.
The recommended user-facing interface is based on HDF5Matrix
objects and standard R methods. HDF5-backed matrices can be manipulated
using calls such as dim(), [, %*%,
crossprod(), tcrossprod(), scale(),
cor(), svd(), prcomp(), qr(),
chol(), and solve().
Core HDF5 matrix handling: hdf5_create_matrix(),
hdf5_matrix(), list_datasets(), is_open(),
close(), and hdf5_close_all().
Subsetting and conversion: [, [<-,
as.matrix(), and as.data.frame().
Dimension names: rownames(), colnames(), and
dimnames().
Element-wise arithmetic: +, -, *, and
/ for HDF5Matrix objects.
Matrix algebra: %*%, crossprod(),
tcrossprod(), cbind(), and rbind().
Aggregations and summaries: colSums(),
rowSums(), colMeans(), rowMeans(),
colVars(), rowVars(), colSds(),
rowSds(), colMins(), rowMins(),
colMaxs(), rowMaxs(), mean(),
var(), and sd().
Statistical transformations: scale(), sweep(),
and cor().
Matrix decompositions and factorizations: svd(),
prcomp(), qr(), chol(), solve(),
eigen(), and pseudoinverse().
Diagonal, split, reduce, and apply operations:
diag(), diag_op(), diag_scale(),
split_dataset(), reduce(), and
apply_function().
Most user workflows can be expressed through HDF5Matrix objects
and standard R methods. Some functions keep the bd* prefix
because they provide additional utilities that do not map directly to a
standard R generic, or because they expose workflows available in earlier
versions of the package. Examples include utilities for creating HDF5
groups, moving datasets, and writing HDF5-backed dimension names. These
functions remain part of the package API and are documented in their
corresponding help pages.
Block-wise operations can be configured with
hdf5matrix_options(), including options for parallel execution,
number of threads, block size, and HDF5 compression. Open HDF5 resources
can be closed explicitly with close() for individual objects or
hdf5_close_all() for all handles tracked by the package.
BigDataStatMeth is organized around a standard R interface backed by
a C++ computational infrastructure. The user-facing layer is based on
HDF5Matrix objects and S3 methods, allowing HDF5-backed
matrices to be used with familiar R functions.
Internally, a lightweight R6 layer connects these R methods with the C++ backend. The C++ infrastructure provides classes for managing HDF5 files, groups, and datasets, together with block-wise routines for linear algebra and statistical operations.
This design allows developers to implement new scalable methods from Rcpp-based code while reusing the package machinery for HDF5 file management, block iteration, compression handling, and numerical computation.
See vignette("BigDataStatMeth") for a practical introduction to
HDF5-backed matrices and the main user-facing functionality.
h5file <- tempfile(fileext = ".h5")
set.seed(1)
X <- matrix(rnorm(100 * 20), nrow = 100, ncol = 20)
X_h5 <- hdf5_create_matrix(
filename = h5file,
dataset = "data/X",
data = X,
overwrite = TRUE
)
dim(X_h5)
colMeans(X_h5)
XtX_h5 <- crossprod(X_h5)
dim(XtX_h5)
close(X_h5)
close(XtX_h5)
hdf5_close_all(verbose = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.