Expectation Maximization (EM) for imputation of missing values.

Share:

Description

Missing values are iterarively updated via an EM algorithm.

Usage

1
2
imputeEM(data, impute.ncomps = 2, pca.ncomps = 2, CV = TRUE, Init = "mean",
         scale = TRUE, iters = 25, tol = .Machine$double.eps^0.25)

Arguments

data

a dataset with missing values.

impute.ncomps

integer corresponding to the minimum number of components to test.

pca.ncomps

minimum number of components to use in the imputation.

CV

Use cross-validation in determining the optimal number of components to retain for the final imputation.

Init

For continous variables impute either the mean or median.

scale

Scale variables to unit variance.

iters

For continous variables impute either the mean or median.

tol

the threshold for assessing convergence.

Details

A completed data frame is returned that mirrors a model.matrix. NAs are replaced with convergence values as obtained via EM. If object contains no NAs, it is returned unaltered.

Value

imputeEM returns a list containing the following components:

Imputed.DataFrames

A list of imputed data frames across impute.comps

Imputed.Continous

A list of imputed values, at each EM iteration, across impute.comps

CV.Results

Cross-validation results across impute.comps

ncomps

impute.comps

Author(s)

Nelson Lee Afanador (nelson.afanador@mvdalab.com), Thanh Tran (thanh.tran@mvdalab.com)

References

B. Walczak, D.L. Massart. Dealing with missing data, Part I. Chemom. Intell. Lab. Syst. 58 (2001); 15:27

Examples

1
2
dat <- introNAs(iris, percent = 25)
imputeEM(dat)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.