ximputia: Missing Data Imputation using PCA and the Iterative Algorithm...
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

ximputia

R Documentation

Missing Data Imputation using PCA and the Iterative Algorithm (IA)

Description

Imputation of missing data contained in a matrix X using PCA and the so-called "iterative algorithm" (IA).

Missing data are estimated from iterative calculations of PCA scores and loadings. Intial estimates are given to the missing data, first scores and loadings matrices (T and P) are calculated, giving new estimates (from T P'). The process is repeated until convergence or a maximal number of iterations (possibly equal to 1).

Depending on argument start, the initial estimates for the missing data x_ij are calculated either by NIPALS (version allowing missing data) or the means of columns j.

IA is described by Walczak & Massart 2001 (section 2 p.16) and Folch-Fortuny et al. 2016 (section 2.3.3).

It can also be considered as an expectation-maximmization (EM) algorithm. The "EM-Wold" algorithm for PCA cross-vlaidation presented by Bro et al. 2008 (p. 1245) uses such an approcah, with a single iteration (and another algorithm than NIPALS).

Usage


ximputia(X, ncomp, algo = NULL,
  start = c("nipals", "means"),
  tol = .Machine$double.eps^0.5, 
  maxit = 10000,
  gs = TRUE,
  print = TRUE, ...)

Arguments

`X`	A `n x p` matrix or data frame with missing data to be imputed.
`ncomp`	The number of components (latent variables) of the PCA model used for imputation.
`algo`	Algorithm (e.g. `pca_eigen`) used for fitting the PCA model. Default to `NULL` (see `pca`).
`start`	Method used for the initialestimate. Possible values are `"nipals"` (default) or `"means"`.
`tol`	Tolerance for testing convergence of the IA algorithm.
`maxit`	Maximum number of iterations for the IA algorithm.
`gs`	See `pca_nipalsna`.
`print`	Logical. If codeTRUE, fitting information are printed.
`...`	Optionnal arguments to pass through function `algo`.

Value

A list of outputs (see examples).

References

Bro, R., Kjeldahl, K., Smilde, A.K., Kiers, H.A.L., 2008. Cross-validation of component models: A critical look at current methods. Anal Bioanal Chem 390, 1241-1251. https://doi.org/10.1007/s00216-007-1790-1

de La Fuente, R.L.-N. de la, GarcÃaâMuÃ±oz, S., Biegler, L.T., 2010. An efficient nonlinear programming strategy for PCA models with incomplete data sets. Journal of Chemometrics 24, 301-311. https://doi.org/10.1002/cem.1306

Folch-Fortuny, A., Arteaga, F., Ferrer, A., 2016. Missing Data Imputation Toolbox for MATLAB. Chemometrics and Intelligent Laboratory Systems 154, 93-100. https://doi.org/10.1016/j.chemolab.2016.03.019

Walczak, B., Massart, D.L., 2001. Dealing with missing data: Part I. Chemometrics and Intelligent Laboratory Systems 58, 15-27. https://doi.org/10.1016/S0169-7439(01)00131-9

Examples


data(datoctane)
X <- datoctane$X
## removing outliers
zX <- X[-c(25:26, 36:39), ]
n <- nrow(zX)
p <- ncol(zX)
N <- n * p
plotsp(zX)

############################ NAs simulated in a row of X

zX <- X
## Row i
i <- 18
## 20pct of NAs in row i
s <- sample(1:p, size = round(p / 5))
zX[i, s] <- NA

## Nipals alone

fm <- ximputia(zX, ncomp = 5, maxit = 1)   
names(fm)
Xfit <- fm$X
plot(1:p, X[i, ], type = "l")
points(s, Xfit[i, s], col = "red")

## With iterations

fm <- ximputia(zX, ncomp = 5)              
fm$niter
fm$conv
Xfit <- fm$X

oldpar <- par(mfrow = c(1, 1))
par(mfrow = c(1, 2))
plot(1:p, X[i, ], type = "l")
points(s, Xfit[i, s], col = "red")
#sum(X[i, ] - Xfit[i, ] != 0)
plot(fm$tol, type = "b")
par(oldpar)

plot(Xfit[i, s], X[i, s])
abline(0, 1)

############################ NAs simulated in X

zX <- X
## 20pct of NAs in matrix X
K <- 5
s <- sort(sample(1:N, size = round(N / K)))
zX[s] <- NA
ncomp <- 5

fm <- ximputia(zX, ncomp, maxit = 1)               ## Nipals alone
#fm <- ximputia(zX, ncomp)                         ## With iterations
#fm <- ximputia(zX, ncomp, start = "means")        ## Initial = means
fm$niter
fm$conv
if(!is.na(fm$tol[1])) plot(fm$tol)
## SSR
sum((X[s] - fm$fit)^2)

mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.

mlesnoff/rnirs index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mlesnoff/rnirs
Dimension reduction, Regression and Discrimination for Chemometrics

ximputia: Missing Data Imputation using PCA and the Iterative Algorithm...
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

Missing Data Imputation using PCA and the Iterative Algorithm (IA)

Description

Usage

Arguments

Value

References

Examples

Related to ximputia in mlesnoff/rnirs...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rnirs Dimension reduction, Regression and Discrimination for Chemometrics

ximputia: Missing Data Imputation using PCA and the Iterative Algorithm... In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

Missing Data Imputation using PCA and the Iterative Algorithm (IA)

Description

Usage

Arguments

Value

References

Examples

Related to ximputia in mlesnoff/rnirs...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rnirs
Dimension reduction, Regression and Discrimination for Chemometrics

ximputia: Missing Data Imputation using PCA and the Iterative Algorithm...
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics