imputation | R Documentation |
Missing data imputation (e.g. substitution by value or hotdeck method).
imputation(imethod = "value", D, Attribute = NULL, Missing = NA, Value = 1)
imethod |
imputation method type:
|
D |
dataset with missing data (data.frame) |
Attribute |
if |
Missing |
missing data symbol |
Value |
the substitution value (if |
Check the references.
A data.frame without missing data.
See also http://hdl.handle.net/1822/36210 and http://www3.dsi.uminho.pt/pcortez/rminer.html
Paulo Cortez http://www3.dsi.uminho.pt/pcortez/
M. Brown and J. Kros.
Data mining and the impact of missing data.
In Industrial Management & Data Systems, 103(8):611-621, 2003.
This tutorial shows additional code examples:
P. Cortez.
A tutorial on using the rminer R package for data mining tasks.
Teaching Report, Department of Information Systems, ALGORITMI Research Centre, Engineering School, University of Minho, Guimaraes,
Portugal, July 2015.
http://hdl.handle.net/1822/36210
fit
and delevels
.
d=matrix(ncol=5,nrow=5)
d[1,]=c(5,4,3,2,1)
d[2,]=c(4,3,4,3,4)
d[3,]=c(1,1,1,1,1)
d[4,]=c(4,NA,3,4,4)
d[5,]=c(5,NA,NA,2,1)
d=data.frame(d); d[,3]=factor(d[,3])
print(d)
print(imputation("value",d,3,Value="3"))
print(imputation("value",d,2,Value=median(na.omit(d[,2]))))
print(imputation("value",d,2,Value=c(1,2)))
print(imputation("hotdeck",d,"X2",Value=1))
print(imputation("hotdeck",d,Value=1))
## Not run:
# hotdeck 1-nearest neighbor substitution on a real dataset:
require(kknn)
d=read.table(
file="http://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data",
sep=",",na.strings="?",stringsAsFactors=TRUE)
print(summary(d))
d2=imputation("hotdeck",d,Value=1)
print(summary(d2))
par(mfrow=c(2,1))
hist(d$V26)
hist(d2$V26)
par(mfrow=c(1,1)) # reset mfrow
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.