Imputation in supervised classification

Share:

Description

This function performs data imputation in datasets for supervised classification by using mean, median or knn imputation methods. The mode is used when the attribute is nominal

Usage

1
2
ce.impute(data, method = c("mean", "median", "knn"), atr,
 nomatr = rep(0, 0), k1 = 10)

Arguments

data

the name of the dataset

method

the name of the method to be used

atr

a vector identifying the attributes where imputations will be performed

nomatr

a vector identifying the nominal attributes

k1

the number of neighbors to be used for the knn imputation

Value

Returns a matrix without missing values.

Note

A description of all the imputations carried out may be stored in a report that is later saved to the current workspace. To produce the report, lines at the end of the code must be uncommented. The report objects name starts with Imput.rep.

Author(s)

Caroline Rodriguez

References

Acuna, E. and Rodriguez, C. (2004). The treatment of missing values and its effect in the classifier accuracy. In D. Banks, L. House, F.R. McMorris, P. Arabie, W. Gaul (Eds). Classification, Clustering and Data Mining Applications. Springer-Verlag Berlin-Heidelberg, 639-648.

See Also

clean

Examples

1
2
3
4
5
data(hepatitis)
#--------Median Imputation-----------
#ce.impute(hepatitis,"median",1:19)
#--------knn Imputation--------------
hepa.imputed=ce.impute(hepatitis,"knn",k1=10)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.