filter_maf: Remove SNPs by Minor Allele Frequency from an HDF5Matrix

View source: R/S3_omics.R

filter_mafR Documentation

Remove SNPs by Minor Allele Frequency from an HDF5Matrix

Description

Removes columns or rows whose Minor Allele Frequency (MAF) exceeds maf_threshold. Designed for 0/1/2-coded diploid genotype matrices.

When out_group/out_dataset are NULL (default) the result is written alongside the input dataset with suffix "_maf_filtered".

Usage

filter_maf(x, ...)

## S3 method for class 'HDF5Matrix'
filter_maf(
  x,
  out_group = NULL,
  out_dataset = NULL,
  maf_threshold = 0.05,
  by_cols = FALSE,
  block_size = 100L,
  overwrite = FALSE,
  ...
)

Arguments

x

An HDF5Matrix containing SNP data.

...

Ignored.

out_group

Output group. NULL (default) = same group as input.

out_dataset

Output dataset name. NULL (default) = input name + "_maf_filtered".

maf_threshold

Numeric in [0, 0.5]. MAF threshold (default 0.05). SNPs with MAF above this value are removed.

by_cols

Logical. Process by columns (FALSE, default) or rows.

block_size

Integer. Block size for I/O. Default 100L.

overwrite

Logical. Overwrite existing output. Default FALSE.

Value

HDF5Matrix pointing to the filtered dataset.

Examples


fn <- tempfile(fileext = ".h5")
snps <- matrix(sample(c(0, 1, 2), 200, replace = TRUE,
                       prob = c(.6, .3, .1)), 20, 10)
X   <- hdf5_create_matrix(fn, "geno/raw", data = snps)

# Filter with auto output path (adds "_maf_filtered" suffix)
out <- filter_maf(X, maf_threshold = 0.05)

# Filter with explicit output
out2 <- filter_maf(X, out_group = "geno",
                   out_dataset = "maf_filtered", overwrite = TRUE)
hdf5_close_all()
unlink(fn)



BigDataStatMeth documentation built on May 15, 2026, 1:07 a.m.