knnImp: Fill in NA values with the values of the nearest neighbours

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Function that fills in all NA values using the k Nearest Neighbours of each case with NA values. It uses the median/most frequent value within the neighbours to fill in the NAs.

Usage

1
knnImp(data, k = 10, scale = TRUE, distData = NULL)

Arguments

data

A data frame with the data set

k

The number of nearest neighbours to use (defaults to 10)

scale

Boolean setting if the data should be scale before finding the nearest neighbours (defaults to TRUE)

distData

Optionally you may sepecify here a data frame containing the data set that should be used to find the neighbours. This is usefull when filling in NA values on a test set, where you should use only information from the training set. This defaults to NULL, which means that the neighbours will be searched in data

Details

This function uses the k-nearest neighbours to fill in the unknown (NA) values in a data set. For each case with any NA value it will search for its k most similar cases and use the values of these cases to fill in the unknowns.

The function will use either the median (in case of numeric variables) or the most frequent value (in case of factors), of the neighbours to fill in the NAs.

Value

A data frame without NA values

Author(s)

Luis Torgo ltorgo@dcc.fc.up.pt

References

Torgo, L. (2014) An Infra-Structure for Performance Estimation and Experimental Comparison of Predictive Models in R. arXiv:1412.0436 [cs.MS] http://arxiv.org/abs/1412.0436

See Also

na.omit

Examples

1
2
3
4
5
6
## Not run: 
data(algae,package="DMwR")
cleanAlgae <- knnImp(algae)
summary(cleanAlgae)

## End(Not run)

Example output

    season       size       speed         mxPH            mnO2       
 autumn:40   large :45   high  :84   Min.   :5.600   Min.   : 1.500  
 spring:53   medium:84   low   :33   1st Qu.:7.700   1st Qu.: 7.775  
 summer:45   small :71   medium:83   Median :8.055   Median : 9.800  
 winter:62                           Mean   :8.010   Mean   : 9.134  
                                     3rd Qu.:8.400   3rd Qu.:10.800  
                                     Max.   :9.700   Max.   :13.400  
       Cl               NO3              NH4                oPO4       
 Min.   :  0.222   Min.   : 0.050   Min.   :    5.00   Min.   :  1.00  
 1st Qu.: 10.352   1st Qu.: 1.312   1st Qu.:   38.03   1st Qu.: 15.37  
 Median : 32.178   Median : 2.675   Median :  103.17   Median : 40.15  
 Mean   : 42.368   Mean   : 3.274   Mean   :  496.97   Mean   : 73.39  
 3rd Qu.: 57.750   3rd Qu.: 4.421   3rd Qu.:  225.65   3rd Qu.: 98.69  
 Max.   :391.500   Max.   :45.650   Max.   :24064.00   Max.   :564.60  
      PO4             Chla               a1              a2        
 Min.   :  1.0   Min.   :  0.200   Min.   : 0.00   Min.   : 0.000  
 1st Qu.: 40.5   1st Qu.:  1.962   1st Qu.: 1.50   1st Qu.: 0.000  
 Median :103.3   Median :  5.155   Median : 6.95   Median : 3.000  
 Mean   :137.6   Mean   : 13.256   Mean   :16.92   Mean   : 7.458  
 3rd Qu.:213.2   3rd Qu.: 17.200   3rd Qu.:24.80   3rd Qu.:11.375  
 Max.   :771.6   Max.   :110.456   Max.   :89.80   Max.   :72.600  
       a3               a4               a5               a6        
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000  
 Median : 1.550   Median : 0.000   Median : 1.900   Median : 0.000  
 Mean   : 4.309   Mean   : 1.992   Mean   : 5.064   Mean   : 5.964  
 3rd Qu.: 4.925   3rd Qu.: 2.400   3rd Qu.: 7.500   3rd Qu.: 6.925  
 Max.   :42.800   Max.   :44.600   Max.   :44.400   Max.   :77.600  
       a7        
 Min.   : 0.000  
 1st Qu.: 0.000  
 Median : 1.000  
 Mean   : 2.495  
 3rd Qu.: 2.400  
 Max.   :31.600  

performanceEstimation documentation built on May 2, 2019, 6:01 a.m.