Description Usage Arguments Details Value Author(s) References See Also Examples
Function that fills in all NA values using the k Nearest Neighbours of each case with NA values. By default it uses the values of the neighbours and obtains an weighted (by the distance to the case) average of their values to fill in the unknows. If meth='median' it uses the median/most frequent value, instead.
1 2 | knnImputation(data, k = 10, scale = T, meth = "weighAvg",
distData = NULL)
|
data |
A data frame with the data set |
k |
The number of nearest neighbours to use (defaults to 10) |
scale |
Boolean setting if the data should be scale before finding the nearest neighbours (defaults to T) |
meth |
String indicating the method used to calculate the value to fill in each NA. Available values are 'median' or 'weighAvg' (the default). |
distData |
Optionally you may sepecify here a data frame containing the data set
that should be used to find the neighbours. This is usefull when
filling in NA values on a test set, where you should use only
information from the training set. This defaults to NULL, which means
that the neighbours will be searched in |
This function uses the k-nearest neighbours to fill in the unknown (NA) values in a data set. For each case with any NA value it will search for its k most similar cases and use the values of these cases to fill in the unknowns.
If meth='median'
the function will use either the median (in
case of numeric variables) or the most frequent value (in case of
factors), of the neighbours to fill in the NAs. If
meth='weighAvg'
the function will use a weighted average of the
values of the neighbours. The weights are given by exp(-dist(k,x)
where dist(k,x)
is the euclidean distance between the case with
NAs (x) and the neighbour k.
A data frame without NA values
Luis Torgo ltorgo@dcc.fc.up.pt
Torgo, L. (2010) Data Mining using R: learning with case studies, CRC Press (ISBN: 9781439810187).
http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR
centralImputation
, centralValue
, complete.cases
, na.omit
1 2 3 | data(algae)
cleanAlgae <- knnImputation(algae)
summary(cleanAlgae)
|
Loading required package: lattice
Loading required package: grid
season size speed mxPH mnO2
autumn:40 large :45 high :84 Min. :5.600 Min. : 1.500
spring:53 medium:84 low :33 1st Qu.:7.700 1st Qu.: 7.775
summer:45 small :71 medium:83 Median :8.055 Median : 9.800
winter:62 Mean :8.011 Mean : 9.129
3rd Qu.:8.400 3rd Qu.:10.800
Max. :9.700 Max. :13.400
Cl NO3 NH4 oPO4
Min. : 0.222 Min. : 0.050 Min. : 5.00 Min. : 1.00
1st Qu.: 10.542 1st Qu.: 1.312 1st Qu.: 38.78 1st Qu.: 15.37
Median : 32.178 Median : 2.675 Median : 103.17 Median : 40.15
Mean : 42.661 Mean : 3.277 Mean : 498.62 Mean : 73.60
3rd Qu.: 57.775 3rd Qu.: 4.421 3rd Qu.: 227.89 3rd Qu.:100.50
Max. :391.500 Max. :45.650 Max. :24064.00 Max. :564.60
PO4 Chla a1 a2
Min. : 1.0 Min. : 0.2 Min. : 0.00 Min. : 0.000
1st Qu.: 40.5 1st Qu.: 2.0 1st Qu.: 1.50 1st Qu.: 0.000
Median :103.3 Median : 5.2 Median : 6.95 Median : 3.000
Mean :137.7 Mean : 13.4 Mean :16.92 Mean : 7.458
3rd Qu.:214.0 3rd Qu.: 17.2 3rd Qu.:24.80 3rd Qu.:11.375
Max. :771.6 Max. :110.5 Max. :89.80 Max. :72.600
a3 a4 a5 a6
Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000
Median : 1.550 Median : 0.000 Median : 1.900 Median : 0.000
Mean : 4.309 Mean : 1.992 Mean : 5.064 Mean : 5.964
3rd Qu.: 4.925 3rd Qu.: 2.400 3rd Qu.: 7.500 3rd Qu.: 6.925
Max. :42.800 Max. :44.600 Max. :44.400 Max. :77.600
a7
Min. : 0.000
1st Qu.: 0.000
Median : 1.000
Mean : 2.495
3rd Qu.: 2.400
Max. :31.600
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.