kNN | R Documentation |
k-Nearest Neighbour Imputation based on a variation of the Gower Distance for numerical, categorical, ordered and semi-continous variables.
kNN( data, variable = colnames(data), metric = NULL, k = 5, dist_var = colnames(data), weights = NULL, numFun = median, catFun = maxCat, makeNA = NULL, NAcond = NULL, impNA = TRUE, donorcond = NULL, mixed = vector(), mixed.constant = NULL, trace = FALSE, imp_var = TRUE, imp_suffix = "imp", addRF = FALSE, onlyRF = FALSE, addRandom = FALSE, useImputedDist = TRUE, weightDist = FALSE, methodStand = "range", ordFun = medianSamp )
data |
data.frame or matrix |
variable |
variables where missing values should be imputed |
metric |
metric to be used for calculating the distances between |
k |
number of Nearest Neighbours used |
dist_var |
names or variables to be used for distance calculation |
weights |
weights for the variables for distance calculation.
If |
numFun |
function for aggregating the k Nearest Neighbours in the case of a numerical variable |
catFun |
function for aggregating the k Nearest Neighbours in the case of a categorical variable |
makeNA |
list of length equal to the number of variables, with values, that should be converted to NA for each variable |
NAcond |
list of length equal to the number of variables, with a condition for imputing a NA |
impNA |
TRUE/FALSE whether NA should be imputed |
donorcond |
list of length equal to the number of variables, with a donorcond condition as character string. e.g. a list element can be ">5" or c(">5","<10). If the list element for a variable is NULL no condition will be applied for this variable. |
mixed |
names of mixed variables |
mixed.constant |
vector with length equal to the number of semi-continuous variables specifying the point of the semi-continuous distribution with non-zero probability |
trace |
TRUE/FALSE if additional information about the imputation process should be printed |
imp_var |
TRUE/FALSE if a TRUE/FALSE variables for each imputed variable should be created show the imputation status |
imp_suffix |
suffix for the TRUE/FALSE variables showing the imputation status |
addRF |
TRUE/FALSE each variable will be modelled using random forest regression ( |
onlyRF |
TRUE/FALSE if TRUE only additional distance variables created from random forest regression will be used as distance variables. |
addRandom |
TRUE/FALSE if an additional random variable should be added for distance calculation |
useImputedDist |
TRUE/FALSE if an imputed value should be used for distance calculation for imputing another variable. Be aware that this results in a dependency on the ordering of the variables. |
weightDist |
TRUE/FALSE if the distances of the k nearest neighbours should be used as weights in the aggregation step |
methodStand |
either "range" or "iqr" to be used in the standardization of numeric vaiables in the gower distance |
ordFun |
function for aggregating the k Nearest Neighbours in the case of a ordered factor variable |
the imputed data set.
Alexander Kowarik, Statistik Austria
A. Kowarik, M. Templ (2016) Imputation with R package VIM. Journal of Statistical Software, 74(7), 1-16.
Other imputation methods:
hotdeck()
,
impPCA()
,
irmi()
,
matchImpute()
,
medianSamp()
,
rangerImpute()
,
regressionImp()
,
sampleCat()
data(sleep) kNN(sleep) library(laeken) kNN(sleep, numFun = weightedMean, weightDist=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.