impute_snps: Impute missing SNP values in an HDF5Matrix

View source: R/S3_omics.R

impute_snpsR Documentation

Impute missing SNP values in an HDF5Matrix

Description

Fills NA entries in SNP data by computing column or row means of non-missing values. Intended for 0/1/2-coded diploid genotype matrices.

Usage

impute_snps(x, ...)

## S3 method for class 'HDF5Matrix'
impute_snps(
  x,
  out_group = NULL,
  out_dataset = NULL,
  by_cols = TRUE,
  threads = -1L,
  overwrite = FALSE,
  ...
)

Arguments

x

An HDF5Matrix containing SNP data with NAs.

...

Ignored.

out_group

Output group. NULL = same as input (default).

out_dataset

Output dataset name. NULL = same as input (default, in-place).

by_cols

Logical. Impute by columns (TRUE, default) or rows.

threads

Integer. Number of threads (-1 = auto).

overwrite

Logical. Overwrite existing output. Default FALSE.

Value

HDF5Matrix pointing to the imputed dataset.

Examples


tmp <- tempfile(fileext = ".h5")

# SNP data: 0/1/2 coded, 3 = missing (not NA)
snps <- matrix(sample(c(0L, 1L, 2L, 3L), 100 * 20,
                       replace = TRUE,
                       prob    = c(0.3, 0.3, 0.3, 0.1)),
               nrow = 100, ncol = 20)

X   <- hdf5_create_matrix(tmp, "geno/raw", data = snps)
imp <- impute_snps(X, out_group = "geno", out_dataset = "imputed")
dim(imp)

hdf5_close_all()
unlink(tmp)



BigDataStatMeth documentation built on May 15, 2026, 1:07 a.m.