ximputia: Missing Data Imputation using PCA and the Iterative Algorithm...

View source: R/ximputia.R

ximputiaR Documentation

Missing Data Imputation using PCA and the Iterative Algorithm (IA)

Description

Imputation of missing data contained in a matrix X using PCA and the so-called "iterative algorithm" (IA).

Missing data are estimated from iterative calculations of PCA scores and loadings. Intial estimates are given to the missing data, first scores and loadings matrices (T and P) are calculated, giving new estimates (from T P'). The process is repeated until convergence or a maximal number of iterations (possibly equal to 1).

Depending on argument start, the initial estimates for the missing data x_ij are calculated either by NIPALS (version allowing missing data) or the means of columns j.

IA is described by Walczak & Massart 2001 (section 2 p.16) and Folch-Fortuny et al. 2016 (section 2.3.3).

It can also be considered as an expectation-maximmization (EM) algorithm. The "EM-Wold" algorithm for PCA cross-vlaidation presented by Bro et al. 2008 (p. 1245) uses such an approcah, with a single iteration (and another algorithm than NIPALS).

Usage


ximputia(X, ncomp, algo = NULL,
  start = c("nipals", "means"),
  tol = .Machine$double.eps^0.5, 
  maxit = 10000,
  gs = TRUE,
  print = TRUE, ...)

Arguments

X

A n x p matrix or data frame with missing data to be imputed.

ncomp

The number of components (latent variables) of the PCA model used for imputation.

algo

Algorithm (e.g. pca_eigen) used for fitting the PCA model. Default to NULL (see pca).

start

Method used for the initialestimate. Possible values are "nipals" (default) or "means".

tol

Tolerance for testing convergence of the IA algorithm.

maxit

Maximum number of iterations for the IA algorithm.

gs

See pca_nipalsna.

print

Logical. If codeTRUE, fitting information are printed.

...

Optionnal arguments to pass through function algo.

Value

A list of outputs (see examples).

References

Bro, R., Kjeldahl, K., Smilde, A.K., Kiers, H.A.L., 2008. Cross-validation of component models: A critical look at current methods. Anal Bioanal Chem 390, 1241-1251. https://doi.org/10.1007/s00216-007-1790-1

de La Fuente, R.L.-N. de la, García‐Muñoz, S., Biegler, L.T., 2010. An efficient nonlinear programming strategy for PCA models with incomplete data sets. Journal of Chemometrics 24, 301-311. https://doi.org/10.1002/cem.1306

Folch-Fortuny, A., Arteaga, F., Ferrer, A., 2016. Missing Data Imputation Toolbox for MATLAB. Chemometrics and Intelligent Laboratory Systems 154, 93-100. https://doi.org/10.1016/j.chemolab.2016.03.019

Walczak, B., Massart, D.L., 2001. Dealing with missing data: Part I. Chemometrics and Intelligent Laboratory Systems 58, 15-27. https://doi.org/10.1016/S0169-7439(01)00131-9

Examples


data(datoctane)
X <- datoctane$X
## removing outliers
zX <- X[-c(25:26, 36:39), ]
n <- nrow(zX)
p <- ncol(zX)
N <- n * p
plotsp(zX)

############################ NAs simulated in a row of X

zX <- X
## Row i
i <- 18
## 20pct of NAs in row i
s <- sample(1:p, size = round(p / 5))
zX[i, s] <- NA

## Nipals alone

fm <- ximputia(zX, ncomp = 5, maxit = 1)   
names(fm)
Xfit <- fm$X
plot(1:p, X[i, ], type = "l")
points(s, Xfit[i, s], col = "red")

## With iterations

fm <- ximputia(zX, ncomp = 5)              
fm$niter
fm$conv
Xfit <- fm$X

oldpar <- par(mfrow = c(1, 1))
par(mfrow = c(1, 2))
plot(1:p, X[i, ], type = "l")
points(s, Xfit[i, s], col = "red")
#sum(X[i, ] - Xfit[i, ] != 0)
plot(fm$tol, type = "b")
par(oldpar)

plot(Xfit[i, s], X[i, s])
abline(0, 1)

############################ NAs simulated in X

zX <- X
## 20pct of NAs in matrix X
K <- 5
s <- sort(sample(1:N, size = round(N / K)))
zX[s] <- NA
ncomp <- 5

fm <- ximputia(zX, ncomp, maxit = 1)               ## Nipals alone
#fm <- ximputia(zX, ncomp)                         ## With iterations
#fm <- ximputia(zX, ncomp, start = "means")        ## Initial = means
fm$niter
fm$conv
if(!is.na(fm$tol[1])) plot(fm$tol)
## SSR
sum((X[s] - fm$fit)^2)


mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.