pca_imp: Impute Numeric Matrix with PCA Imputation

View source: R/pca_imp.R

pca_impR Documentation

Impute Numeric Matrix with PCA Imputation

Description

Impute missing values in a numeric matrix using (regularized) iterative PCA.

Usage

pca_imp(
  obj,
  ncp = 2,
  scale = TRUE,
  method = c("regularized", "EM"),
  coeff.ridge = 1,
  row.w = NULL,
  threshold = 1e-06,
  seed = NULL,
  nb.init = 1,
  maxiter = 1000,
  miniter = 5,
  colmax = 0.9,
  post_imp = TRUE,
  na_check = TRUE
)

Arguments

obj

A numeric matrix with samples in rows and features in columns.

ncp

Integer. Number of components used to predict the missing entries.

scale

Logical. If TRUE (default), variables are scaled to have unit variance.

method

Character. Either "regularized" (default) or "EM".

coeff.ridge

Numeric. Ridge regularization coefficient (default is 1). Only used if method = "regularized". Values < 1 regularize less (closer to EM); values > 1 regularize more (closer to mean imputation).

row.w

Row weights (internally normalized to sum to 1). Can be one of:

  • NULL (default): All rows weighted equally.

  • A numeric vector: Custom positive weights of length nrow(obj).

  • "n_miss": Rows with more missing values receive lower weight.

threshold

Numeric. The threshold for assessing convergence.

seed

Integer. Random number generator seed.

nb.init

Integer. Number of random initializations. The first initialization is always mean imputation.

maxiter

Integer. Maximum number of iterations for the algorithm.

miniter

Integer. Minimum number of iterations for the algorithm.

colmax

Numeric. A number from 0 to 1. Threshold of column-wise missing data rate above which imputation is skipped.

post_imp

Boolean. Whether to impute remaining missing values (those that failed imputation) using column means.

na_check

Boolean. Check for leftover NA values in the results or not (internal use).

Details

This algorithm is based on the original missMDA::imputePCA function and is optimized for tall or wide numeric matrices.

Value

A numeric matrix of the same dimensions as obj with missing values imputed.

Author(s)

Francois Husson and Julie Josse (original missMDA implementation).

References

Josse, J. & Husson, F. (2013). Handling missing values in exploratory multivariate data analysis methods. Journal de la SFdS. 153 (2), pp. 79-99.

Josse, J. and Husson, F. (2016). missMDA: A Package for Handling Missing Values in Multivariate Data Analysis. Journal of Statistical Software, 70 (1), pp 1-31. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v070.i01")}

Examples

obj <- sim_mat(10, 10)$input
sum(is.na(obj))
obj[1:4, 1:4]
# Randomly initialize missing values 5 times (1st time is mean).
pca_imp(obj, ncp = 2, nb.init = 5)


slideimp documentation built on April 17, 2026, 1:07 a.m.