Imputes missing values in a data matrix using the K-nearest neighbor algorithm.

1 2 |

`data` |
a data matrix |

`k` |
number of neighbors to use |

`distance` |
distance metric to use, one of "euclidean" or "correlation" |

`rm.na` |
should NA values be imputed? |

`rm.nan` |
should NaN values be imputed? |

`rm.inf` |
should Inf values be imputed? |

Uses the K-nearest neighbor algorithm, as described in Troyanskaya et al., 2001, to impute missing values in a data matrix. Elements are imputed row-wise, so that neighbors are selected based on the rows which are closest in distance to the row with missing values. There are two choices for a distance metric, either Euclidean (the default) or a correlation 'metric'. If the latter is selected, matrix values are first row-normalized to mean zero and standard deviation one to select neighbors. Values are 'un'-normalized by applying the inverse transformation prior to returning the imputed data matrix.

A data matrix with missing values imputed.

Guy Brock

O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman. Missing value estimation methods for dna microarrays. Bioinformatics, 17(6):520-5, 2001.

G.N. Brock, J.R. Shaffer, R.E. Blakesley, M.J. Lotz, and G.C. Tseng. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes. BMC Bioinformatics, 9:12, 2008.

See the package vignette for illustration on usage.

1 2 3 4 5 6 |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.