imputation: Missing data imputation (e.g. substitution by value or...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/preprocess.R

Description

Missing data imputation (e.g. substitution by value or hotdeck method).

Usage

1
imputation(imethod = "value", D, Attribute = NULL, Missing = NA, Value = 1)

Arguments

imethod

imputation method type:

  • value – substitutes missing data by Value (with single element or several elements);

  • hotdeck – searches first the most similar example (i.e. using a k-nearest neighbor method – knn) in the dataset and replaces the missing data by the value found in such example;

D

dataset with missing data (data.frame)

Attribute

if NULL then all attributes (data columns) with missing data are replaced. Else, Attribute is the attribute number (numeric) or name (character).

Missing

missing data symbol

Value

the substitution value (if imethod=value) or number of neighbors (k of knn).

Details

Check the references.

Value

A data.frame without missing data.

Note

See also http://hdl.handle.net/1822/36210 and http://www3.dsi.uminho.pt/pcortez/rminer.html

Author(s)

Paulo Cortez http://www3.dsi.uminho.pt/pcortez

References

See Also

fit and delevels.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
d=matrix(ncol=5,nrow=5)
d[1,]=c(5,4,3,2,1)
d[2,]=c(4,3,4,3,4)
d[3,]=c(1,1,1,1,1)
d[4,]=c(4,NA,3,4,4)
d[5,]=c(5,NA,NA,2,1)
d=data.frame(d); d[,3]=factor(d[,3])
print(d)
print(imputation("value",d,3,Value="3"))
print(imputation("value",d,2,Value=median(na.omit(d[,2]))))
print(imputation("value",d,2,Value=c(1,2)))
print(imputation("hotdeck",d,"X2",Value=1))
print(imputation("hotdeck",d,Value=1))

## Not run: 
# hotdeck 1-nearest neighbor substitution on a real dataset:
d=read.table(
   file="http://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data",
   sep=",",na.strings="?")
print(summary(d))
d2=imputation("hotdeck",d,Value=1)
print(summary(d2))
par(mfrow=c(2,1))
hist(d$V26)
hist(d2$V26)
par(mfrow=c(1,1)) # reset mfrow

## End(Not run)

rminer documentation built on May 1, 2019, 7:48 p.m.