knn_imp: K-Nearest Neighbor Imputation for Numeric Matrices

View source: R/knn_imp.R

knn_impR Documentation

K-Nearest Neighbor Imputation for Numeric Matrices

Description

Impute missing values in a numeric matrix using full K-nearest neighbors (K-NN).

Usage

knn_imp(
  obj,
  k,
  colmax = 0.9,
  method = c("euclidean", "manhattan"),
  cores = 1,
  post_imp = TRUE,
  subset = NULL,
  dist_pow = 0,
  na_check = TRUE,
  .progress = FALSE
)

Arguments

obj

A numeric matrix with samples in rows and features in columns.

k

Integer. Number of nearest neighbors to use for K-NN imputation.

colmax

Numeric scalar between 0 and 1. Columns with a missing-data proportion greater than colmax are excluded from the main imputation method. Excluded columns are left unchanged unless post_imp = TRUE, in which case remaining missing values are replaced by column means when possible.

method

Character. K-NN imputation distance method: either "euclidean" or "manhattan".

cores

Integer. Number of cores to use for K-NN imputation. Defaults to 1.

post_imp

Logical. If TRUE, replace missing values remaining after the main imputation method with column means when possible.

subset

Optional character or integer vector specifying columns to target for imputation. If NULL, all eligible columns are targeted.

dist_pow

Numeric. Power used to penalize more distant neighbors in the weighted average. dist_pow = 0 gives an unweighted average of the nearest neighbors.

na_check

Logical. If TRUE, check whether the returned matrix still contains missing values.

.progress

Logical. If TRUE, show imputation progress.

Details

knn_imp() performs imputation column-wise, treating rows as observations and columns as features.

Nearest neighbors are found using brute-force K-NN.

When dist_pow > 0, imputed values are computed as distance-weighted averages. Weights are inverse distances raised to the power of dist_pow.

Value

A numeric matrix of the same dimensions as obj, with missing values imputed. The returned object has class slideimp_results.

K-NN performance optimization

  • Use subset when only specific columns need imputation.

  • Use grouped or sliding-window imputation for very large matrices.

References

Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520-525. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/bioinformatics/17.6.520")}

Examples

set.seed(123)
obj <- sim_mat(20, 20, perc_col_na = 1)$input
sum(is.na(obj))

# Select `k` with `tune_imp()`.
result <- knn_imp(obj, k = 10, .progress = FALSE)
result


slideimp documentation built on June 17, 2026, 1:08 a.m.