# dataFiller: Missing Observations Filling Function In knnGarden: Multi-distance based k-Nearest Neighbors

## Description

fill in the missing observations in a dataset by exploring similarities between cases

## Usage

 `1` ```dataFiller(data, NAstring = NA) ```

## Arguments

 `data` a dataset that contains missing observations in some cases `NAstring` a character or string that denotes missing values in the input dataset

## Details

fill the cases with missing observations by finding the median of 10 most similar cases with the current one. Of course, the missing in the same column of the 10 cases will be removed when calculating the median. The criterion we define "similar" is based on euclidian distance between standardized cases

## Value

A complete data set with missing observations filled will be returned.

## Note

The cases with missing values in the input dataset will be printed on the screen instead of being returned. The return will be only the complete data set with missing observations filled.

## Author(s)

Boxian Wei(The ideas are inspired by Luis Torgo, and thanks)

## References

Luis Torgo (2003) Data Mining with R:learning by case studies. LIACC-FEP, University of Porto

`knnMCN`, `knnVCN`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40``` ```##---- Should be DIRECTLY executable !! ---- ##-- ==> Define data, use random, ##-- or do help(data=index) for the standard data sets. ## Define Data library(knnGarden) data(iris) v1=c(iris[1:4,3],NA,iris[6:10,3]) v2=iris[101:110,4] v3=iris[101:110,1] v4=c(iris[11:18,3],NA,iris[20,3]) data1=data.frame(v1,v2,v3,v4) ## Call Function data2=dataFiller(data1) ## The function is currently defined as function (data, NAstring = NA) { central.value <- function(x) { if (is.numeric(x)) median(x, na.rm = T) else if (is.factor(x)) levels(x)[which.max(table(x))] else { f <- as.factor(x) levels(f)[which.max(table(f))] } } dist.mtx <- as.matrix(daisy(data, stand = T)) ShowMissing = NULL ShowMissing = data[which(!complete.cases(data)), ] for (r in which(!complete.cases(data))) data[r, which(is.na(data[r, ]))] <- apply(data.frame(data[c(as.integer(names(sort(dist.mtx[r, ])[2:11]))), which(is.na(data[r, ]))]), 2, central.value) cat("the missing case(s) in the orignal dataset ", "\n\n") print(ShowMissing) cat("\n\n") return(data) } ```