cor.HDF5Matrix: Correlation matrix for HDF5Matrix objects

View source: R/S3_correlation.R

cor.HDF5MatrixR Documentation

Correlation matrix for HDF5Matrix objects

Description

Block-wise computation of Pearson or Spearman correlation, running entirely on disk without loading the full matrix into RAM. Supports both auto-correlation cor(X) and cross-correlation cor(X, Y).

Usage

## S3 method for class 'HDF5Matrix'
cor(
  x,
  y = NULL,
  use = "everything",
  method = "pearson",
  trans_x = FALSE,
  trans_y = FALSE,
  compute_pvalues = TRUE,
  block_size = NULL,
  threads = NULL,
  result_path = NULL,
  compression = NULL,
  ...
)

Arguments

x

An HDF5Matrix object.

y

An HDF5Matrix for cross-correlation, or NULL (default) to compute cor(x, x).

use

Character string. Only "everything" (default) and "complete.obs" are currently supported.

method

"pearson" (default) or "spearman".

trans_x

Logical. If TRUE, correlate rows of x instead of columns (useful for sample-sample correlations in omics data). Default FALSE.

trans_y

Logical. Same for y. Default FALSE.

compute_pvalues

Logical. Also compute and store p-values on disk. Default TRUE.

block_size

Integer or NULL. Block size for HDF5 reads (NULL = auto).

threads

Integer or NULL. Number of OpenMP threads (NULL = auto).

result_path

Output location: NULL (default) writes to "CORR/<dataset>/correlation" in the same file as x. A character string specifies a custom output group in the same file. A named list list(file=, group=) writes to a different file.

compression

Integer (0-9) or NULL. gzip compression level for the result datasets. NULL uses the global option set by hdf5matrix_options (default 6). Use 0 to disable compression (faster for benchmarks).

...

Ignored (for S3 compatibility).

Value

An HDF5Matrix pointing to the correlation matrix on disk. Attributes attached to the result:

cor.method

The correlation method used.

cor.type

"single" or "cross".

cor.n.vars

Number of variables (columns/rows correlated).

cor.n.obs

Number of observations used.

cor.pvalues.path

HDF5 path to the p-values dataset (present only when compute_pvalues = TRUE).

Examples


tmp <- tempfile(fileext = ".h5")
X   <- hdf5_create_matrix(tmp, "data/X",
                           data = matrix(rnorm(500), 50, 10))

# Auto-correlation: cor(X) — 10 x 10 matrix
C <- cor(X)
dim(C)
cat("method:", attr(C, "cor.method"), "\n")

# Spearman
Cs <- cor(X, method = "spearman")
dim(Cs)

# Sample-sample correlation (rows)
Sr <- cor(X, trans_x = TRUE)   # 50 x 50
dim(Sr)

X$close(); C$close(); Cs$close(); Sr$close()
unlink(tmp)



BigDataStatMeth documentation built on May 15, 2026, 1:07 a.m.