Description Usage Arguments Details References Examples
Imputation using k-nearest neighbors. For each record, identify missinng features. For each missing feature find the k nearest neighbors which have that feature. Impute the missing value using the imputation function on the k-length vector of values found from the neighbors.
1 |
x |
a data frame or matrix where each row represents a different record |
k |
the number of neighbors to use for imputation |
x.dist |
an optional, pre-computed distance matrix to be used for kNN |
impute.fn |
the imputation function to run on the length k vector of values for a missing feature. Defaults to a weighted mean of the neighboring values weighted by the distance of the neighbors |
verbose |
if TRUE print status updates |
The default impute.fn weighs the k values by their respective distances. First the smallest k distances are extracted into the variable smallest.distances Then, the corresponding values are extracted to knn.values. Finally, knn.weights normalizes the distances by the max distance, and are subtracted by 1. The result is the weighted mean of the values of the nearest neighbors and their weight based on their distance. It is implemented as follows:
1 2 3 4 5 6 7 | impute.fn = function(values,
distances, k) { ranks = order(distances)
smallest.distances = distances[ranks][1:k] #values
corresponding to smallest distances knn.values =
values[ranks][1:k] knn.weights = 1 - (smallest.distances
/ max(distances)) weighted.mean(knn.values, knn.weights)
}
|
A simple mean can be implemented as follows:
1 2 |
Missing value estimation methods for DNA microarrays. Troyanskaya et al.
1 2 3 4 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.