kNN | R Documentation |

k-Nearest Neighbour Imputation based on a variation of the Gower Distance for numerical, categorical, ordered and semi-continous variables.

kNN( data, variable = colnames(data), metric = NULL, k = 5, dist_var = colnames(data), weights = NULL, numFun = median, catFun = maxCat, makeNA = NULL, NAcond = NULL, impNA = TRUE, donorcond = NULL, mixed = vector(), mixed.constant = NULL, trace = FALSE, imp_var = TRUE, imp_suffix = "imp", addRF = FALSE, onlyRF = FALSE, addRandom = FALSE, useImputedDist = TRUE, weightDist = FALSE, methodStand = "range", ordFun = medianSamp )

`data` |
data.frame or matrix |

`variable` |
variables where missing values should be imputed |

`metric` |
metric to be used for calculating the distances between |

`k` |
number of Nearest Neighbours used |

`dist_var` |
names or variables to be used for distance calculation |

`weights` |
weights for the variables for distance calculation.
If |

`numFun` |
function for aggregating the k Nearest Neighbours in the case of a numerical variable |

`catFun` |
function for aggregating the k Nearest Neighbours in the case of a categorical variable |

`makeNA` |
list of length equal to the number of variables, with values, that should be converted to NA for each variable |

`NAcond` |
list of length equal to the number of variables, with a condition for imputing a NA |

`impNA` |
TRUE/FALSE whether NA should be imputed |

`donorcond` |
list of length equal to the number of variables, with a donorcond condition as character string. e.g. a list element can be ">5" or c(">5","<10). If the list element for a variable is NULL no condition will be applied for this variable. |

`mixed` |
names of mixed variables |

`mixed.constant` |
vector with length equal to the number of semi-continuous variables specifying the point of the semi-continuous distribution with non-zero probability |

`trace` |
TRUE/FALSE if additional information about the imputation process should be printed |

`imp_var` |
TRUE/FALSE if a TRUE/FALSE variables for each imputed variable should be created show the imputation status |

`imp_suffix` |
suffix for the TRUE/FALSE variables showing the imputation status |

`addRF` |
TRUE/FALSE each variable will be modelled using random forest regression ( |

`onlyRF` |
TRUE/FALSE if TRUE only additional distance variables created from random forest regression will be used as distance variables. |

`addRandom` |
TRUE/FALSE if an additional random variable should be added for distance calculation |

`useImputedDist` |
TRUE/FALSE if an imputed value should be used for distance calculation for imputing another variable. Be aware that this results in a dependency on the ordering of the variables. |

`weightDist` |
TRUE/FALSE if the distances of the k nearest neighbours should be used as weights in the aggregation step |

`methodStand` |
either "range" or "iqr" to be used in the standardization of numeric vaiables in the gower distance |

`ordFun` |
function for aggregating the k Nearest Neighbours in the case of a ordered factor variable |

the imputed data set.

Alexander Kowarik, Statistik Austria

A. Kowarik, M. Templ (2016) Imputation with
R package VIM. *Journal of
Statistical Software*, 74(7), 1-16.

data(sleep) kNN(sleep) library(laeken) kNN(sleep, numFun = weightedMean, weightDist=TRUE)

