scale: Scale / normalize an HDF5Matrix

View source: R/S3_aggregations.R

scaleR Documentation

Scale / normalize an HDF5Matrix

Description

Block-wise centering and scaling equivalent to base R scale(). The computation runs entirely on disk — the full matrix is never loaded into RAM.

Usage

## S3 method for class 'HDF5Matrix'
scale(
  x,
  center = TRUE,
  scale = TRUE,
  byrows = FALSE,
  wsize = NULL,
  result_path = NULL,
  compression = NULL,
  paral = NULL,
  threads = NULL,
  ...
)

Arguments

x

An HDF5Matrix object.

center

Logical (or numeric vector, see Details). If TRUE (default) subtract column means before scaling.

scale

Logical (or numeric vector, see Details). If TRUE (default) divide by column standard deviations.

byrows

Logical. If TRUE normalize row-wise instead of column-wise. Default FALSE.

wsize

Integer or NULL. Block size for HDF5 reads (NULL = auto).

result_path

Output location. NULL (default) writes to "NORMALIZED/<group>/<dataset>" in the same file. A character string writes to that path in the same file. A named list list(file=, path=) writes to a different file.

compression

Integer (0-9) or NULL. gzip compression level for the result datasets. NULL uses the global option set by hdf5matrix_options (default 6). Use 0 to disable compression (faster for benchmarks).

paral

Logical or NULL. Enable OpenMP parallelism. TRUE forces block-wise streaming (PATH 2) regardless of matrix size, so the thread count is respected. NULL or FALSE uses the preload path (PATH 1) when the matrix fits in RAM. Overrides the global option set by hdf5matrix_options. Default NULL.

threads

Integer or NULL. Number of OpenMP threads when paral = TRUE. Always capped by OMP_THREAD_LIMIT (CRAN compliance). NULL uses the system default. Overrides the global option set by hdf5matrix_options.

...

Ignored (for S3 compatibility).

Details

Passing a pre-computed numeric vector as center or scale is not currently supported. If a vector is supplied it is coerced to a logical (TRUE if length(x) > 0) and a warning is issued.

The returned HDF5Matrix carries scaled:center and scaled:scale attributes (numeric vectors), mirroring the behavior of base::scale().

Performance settings:

Parallelization and thread count can be set globally via hdf5matrix_options or passed explicitly via paral and threads. Explicit parameters take priority over global options.

# Global configuration
hdf5matrix_options(paral = TRUE, threads = 4)
Xs <- scale(X)           # uses 4 threads

# Explicit per-call override
Xs <- scale(X, paral = TRUE, threads = 8)

Value

An HDF5Matrix pointing to the normalized dataset on disk.

See Also

hdf5matrix_options for global performance settings.

Examples


tmp <- tempfile(fileext = ".h5")
X   <- hdf5_create_matrix(tmp, "data/M",
                           data = matrix(rnorm(500), 50, 10))
Xs  <- scale(X)                         # center=TRUE, scale=TRUE by cols
cat("scaled:center[1]:", attr(Xs, "scaled:center")[1], "\n")
X$close(); Xs$close(); unlink(tmp)



BigDataStatMeth documentation built on June 8, 2026, 5:07 p.m.