knn_imp: K-Nearest Neighbor Imputation for Numeric Matrices
In slideimp: Numeric Matrices K-NN and PCA Imputation

knn_imp

R Documentation

K-Nearest Neighbor Imputation for Numeric Matrices

Description

Impute missing values in a numeric matrix using full K-nearest neighbors (K-NN).

Usage

knn_imp(
  obj,
  k,
  colmax = 0.9,
  method = c("euclidean", "manhattan"),
  cores = 1,
  post_imp = TRUE,
  subset = NULL,
  dist_pow = 0,
  na_check = TRUE,
  .progress = FALSE
)

Arguments

`obj`	A numeric matrix with samples in rows and features in columns.
`k`	Integer. Number of nearest neighbors to use for K-NN imputation.
`colmax`	Numeric scalar between `0` and `1`. Columns with a missing-data proportion greater than `colmax` are excluded from the main imputation method. Excluded columns are left unchanged unless `post_imp = TRUE`, in which case remaining missing values are replaced by column means when possible.
`method`	Character. K-NN imputation distance method: either `"euclidean"` or `"manhattan"`.
`cores`	Integer. Number of cores to use for K-NN imputation. Defaults to `1`.
`post_imp`	Logical. If `TRUE`, replace missing values remaining after the main imputation method with column means when possible.
`subset`	Optional character or integer vector specifying columns to target for imputation. If `NULL`, all eligible columns are targeted.
`dist_pow`	Numeric. Power used to penalize more distant neighbors in the weighted average. `dist_pow = 0` gives an unweighted average of the nearest neighbors.
`na_check`	Logical. If `TRUE`, check whether the returned matrix still contains missing values.
`.progress`	Logical. If `TRUE`, show imputation progress.

Details

knn_imp() performs imputation column-wise, treating rows as observations and columns as features.

Nearest neighbors are found using brute-force K-NN.

When dist_pow > 0, imputed values are computed as distance-weighted averages. Weights are inverse distances raised to the power of dist_pow.

Value

A numeric matrix of the same dimensions as obj, with missing values imputed. The returned object has class slideimp_results.

K-NN performance optimization

Use subset when only specific columns need imputation.
Use grouped or sliding-window imputation for very large matrices.

References

Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520-525. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/bioinformatics/17.6.520")}

Examples

set.seed(123)
obj <- sim_mat(20, 20, perc_col_na = 1)$input
sum(is.na(obj))

# Select `k` with `tune_imp()`.
result <- knn_imp(obj, k = 10, .progress = FALSE)
result

slideimp documentation built on June 17, 2026, 1:08 a.m.