Description Usage Arguments Details Value Author(s) References See Also Examples
Function that fills in all NA values using the k Nearest Neighbours of each case with NA values. By default it uses the values of the neighbours and obtains an weighted (by the distance to the case) average of their values to fill in the unknows. If meth='median' it uses the median/most frequent value, instead.
1 2  | knnImputation(data, k = 10, scale = T, meth = "weighAvg",
              distData = NULL)
 | 
data | 
 A data frame with the data set  | 
k | 
 The number of nearest neighbours to use (defaults to 10)  | 
scale | 
 Boolean setting if the data should be scale before finding the nearest neighbours (defaults to T)  | 
meth | 
 String indicating the method used to calculate the value to fill in each NA. Available values are 'median' or 'weighAvg' (the default).  | 
distData | 
 Optionally you may sepecify here a data frame containing the data set
that should be used to find the neighbours. This is usefull when
filling in NA values on a test set, where you should use only
information from the training set. This defaults to NULL, which means
that the neighbours will be searched in   | 
This function uses the k-nearest neighbours to fill in the unknown (NA) values in a data set. For each case with any NA value it will search for its k most similar cases and use the values of these cases to fill in the unknowns.
If meth='median'  the function will use either the median (in
case of numeric variables) or the most frequent value (in case of
factors), of the neighbours to fill in the NAs. If
meth='weighAvg' the function will use a weighted average of the
values of the neighbours. The weights are given by exp(-dist(k,x)
where dist(k,x) is the euclidean distance between the case with
NAs (x) and the neighbour k.
A data frame without NA values
Luis Torgo ltorgo@dcc.fc.up.pt
Torgo, L. (2010) Data Mining using R: learning with case studies, CRC Press (ISBN: 9781439810187).
http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR
centralImputation, centralValue, complete.cases, na.omit
1 2 3  | data(algae)
cleanAlgae <- knnImputation(algae)
summary(cleanAlgae)
 | 
Loading required package: lattice
Loading required package: grid
    season       size       speed         mxPH            mnO2       
 autumn:40   large :45   high  :84   Min.   :5.600   Min.   : 1.500  
 spring:53   medium:84   low   :33   1st Qu.:7.700   1st Qu.: 7.775  
 summer:45   small :71   medium:83   Median :8.055   Median : 9.800  
 winter:62                           Mean   :8.011   Mean   : 9.129  
                                     3rd Qu.:8.400   3rd Qu.:10.800  
                                     Max.   :9.700   Max.   :13.400  
       Cl               NO3              NH4                oPO4       
 Min.   :  0.222   Min.   : 0.050   Min.   :    5.00   Min.   :  1.00  
 1st Qu.: 10.542   1st Qu.: 1.312   1st Qu.:   38.78   1st Qu.: 15.37  
 Median : 32.178   Median : 2.675   Median :  103.17   Median : 40.15  
 Mean   : 42.661   Mean   : 3.277   Mean   :  498.62   Mean   : 73.60  
 3rd Qu.: 57.775   3rd Qu.: 4.421   3rd Qu.:  227.89   3rd Qu.:100.50  
 Max.   :391.500   Max.   :45.650   Max.   :24064.00   Max.   :564.60  
      PO4             Chla             a1              a2        
 Min.   :  1.0   Min.   :  0.2   Min.   : 0.00   Min.   : 0.000  
 1st Qu.: 40.5   1st Qu.:  2.0   1st Qu.: 1.50   1st Qu.: 0.000  
 Median :103.3   Median :  5.2   Median : 6.95   Median : 3.000  
 Mean   :137.7   Mean   : 13.4   Mean   :16.92   Mean   : 7.458  
 3rd Qu.:214.0   3rd Qu.: 17.2   3rd Qu.:24.80   3rd Qu.:11.375  
 Max.   :771.6   Max.   :110.5   Max.   :89.80   Max.   :72.600  
       a3               a4               a5               a6        
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000  
 Median : 1.550   Median : 0.000   Median : 1.900   Median : 0.000  
 Mean   : 4.309   Mean   : 1.992   Mean   : 5.064   Mean   : 5.964  
 3rd Qu.: 4.925   3rd Qu.: 2.400   3rd Qu.: 7.500   3rd Qu.: 6.925  
 Max.   :42.800   Max.   :44.600   Max.   :44.400   Max.   :77.600  
       a7        
 Min.   : 0.000  
 1st Qu.: 0.000  
 Median : 1.000  
 Mean   : 2.495  
 3rd Qu.: 2.400  
 Max.   :31.600  
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.