# Impute missing values

### Description

Imputes missing values in a data matrix using the K-nearest neighbor algorithm.

### Usage

1 2 |

### Arguments

`data` |
a data matrix |

`k` |
number of neighbors to use |

`distance` |
distance metric to use, one of "euclidean" or "correlation" |

`rm.na` |
should NA values be imputed? |

`rm.nan` |
should NaN values be imputed? |

`rm.inf` |
should Inf values be imputed? |

### Details

Uses the K-nearest neighbor algorithm, as described in Troyanskaya et al., 2001, to impute missing values in a data matrix. Elements are imputed row-wise, so that neighbors are selected based on the rows which are closest in distance to the row with missing values. There are two choices for a distance metric, either Euclidean (the default) or a correlation 'metric'. If the latter is selected, matrix values are first row-normalized to mean zero and standard deviation one to select neighbors. Values are 'un'-normalized by applying the inverse transformation prior to returning the imputed data matrix.

### Value

A data matrix with missing values imputed.

### Author(s)

Guy Brock

### References

O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman. Missing value estimation methods for dna microarrays. Bioinformatics, 17(6):520-5, 2001.

G.N. Brock, J.R. Shaffer, R.E. Blakesley, M.J. Lotz, and G.C. Tseng. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes. BMC Bioinformatics, 9:12, 2008.

### See Also

See the package vignette for illustration on usage.

### Examples

1 2 3 4 5 6 |