imputation: Missing data imputation (e.g. substitution by value or...

View source: R/preprocess.R

imputationR Documentation

Missing data imputation (e.g. substitution by value or hotdeck method).

Description

Missing data imputation (e.g. substitution by value or hotdeck method).

Usage

imputation(imethod = "value", D, Attribute = NULL, Missing = NA, Value = 1)

Arguments

imethod

imputation method type:

  • value – substitutes missing data by Value (with single element or several elements);

  • hotdeck – searches first the most similar example (i.e. using a k-nearest neighbor method – knn) in the dataset and replaces the missing data by the value found in such example;

D

dataset with missing data (data.frame)

Attribute

if NULL then all attributes (data columns) with missing data are replaced. Else, Attribute is the attribute number (numeric) or name (character).

Missing

missing data symbol

Value

the substitution value (if imethod=value) or number of neighbors (k of knn).

Details

Check the references.

Value

A data.frame without missing data.

Note

See also http://hdl.handle.net/1822/36210 and http://www3.dsi.uminho.pt/pcortez/rminer.html

Author(s)

Paulo Cortez http://www3.dsi.uminho.pt/pcortez/

References

  • M. Brown and J. Kros.
    Data mining and the impact of missing data.
    In Industrial Management & Data Systems, 103(8):611-621, 2003.

  • This tutorial shows additional code examples:
    P. Cortez.
    A tutorial on using the rminer R package for data mining tasks.
    Teaching Report, Department of Information Systems, ALGORITMI Research Centre, Engineering School, University of Minho, Guimaraes, Portugal, July 2015.
    http://hdl.handle.net/1822/36210

See Also

fit and delevels.

Examples

d=matrix(ncol=5,nrow=5)
d[1,]=c(5,4,3,2,1)
d[2,]=c(4,3,4,3,4)
d[3,]=c(1,1,1,1,1)
d[4,]=c(4,NA,3,4,4)
d[5,]=c(5,NA,NA,2,1)
d=data.frame(d); d[,3]=factor(d[,3])
print(d)
print(imputation("value",d,3,Value="3"))
print(imputation("value",d,2,Value=median(na.omit(d[,2]))))
print(imputation("value",d,2,Value=c(1,2)))
print(imputation("hotdeck",d,"X2",Value=1))
print(imputation("hotdeck",d,Value=1))

## Not run: 
# hotdeck 1-nearest neighbor substitution on a real dataset:
require(kknn)
d=read.table(
   file="http://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data",
   sep=",",na.strings="?",stringsAsFactors=TRUE)
print(summary(d))
d2=imputation("hotdeck",d,Value=1)
print(summary(d2))
par(mfrow=c(2,1))
hist(d$V26)
hist(d2$V26)
par(mfrow=c(1,1)) # reset mfrow

## End(Not run)


rminer documentation built on Oct. 29, 2024, 9:06 a.m.

Related to imputation in rminer...