kNNImpute: kNN Impute

Description Usage Arguments Details References Examples

View source: R/kNN.R

Description

Imputation using k-nearest neighbors. For each record, identify missinng features. For each missing feature find the k nearest neighbors which have that feature. Impute the missing value using the imputation function on the k-length vector of values found from the neighbors.

Usage

1
  kNNImpute(x, k, x.dist = NULL, impute.fn, verbose = T)

Arguments

x

a data frame or matrix where each row represents a different record

k

the number of neighbors to use for imputation

x.dist

an optional, pre-computed distance matrix to be used for kNN

impute.fn

the imputation function to run on the length k vector of values for a missing feature. Defaults to a weighted mean of the neighboring values weighted by the distance of the neighbors

verbose

if TRUE print status updates

Details

The default impute.fn weighs the k values by their respective distances. First the smallest k distances are extracted into the variable smallest.distances Then, the corresponding values are extracted to knn.values. Finally, knn.weights normalizes the distances by the max distance, and are subtracted by 1. The result is the weighted mean of the values of the nearest neighbors and their weight based on their distance. It is implemented as follows:

1
2
3
4
5
6
7
impute.fn = function(values,
  distances, k) { ranks = order(distances)
  smallest.distances = distances[ranks][1:k] #values
  corresponding to smallest distances knn.values =
  values[ranks][1:k] knn.weights = 1 - (smallest.distances
  / max(distances)) weighted.mean(knn.values, knn.weights)
  }

A simple mean can be implemented as follows:

1
2
impute.fn = function(values, distances, k)
  { ranks = order(distances) mean(distances[ranks][1:k]) }

References

Missing value estimation methods for DNA microarrays. Troyanskaya et al.

Examples

1
2
3
4
x = matrix(rnorm(100),10,10)
  x.missing = x > 1
  x[x.missing] = NA
  kNNImpute(x, 3)

jeffwong/imputation documentation built on May 19, 2019, 4:02 a.m.