| pca_imp | R Documentation |
Impute missing values in a numeric matrix using regularized or expectation-maximization (EM) PCA imputation. Supports warm-start LOBPCG with both the previous eigenblock and search direction.
pca_imp(
obj,
ncp = 2,
scale = TRUE,
method = c("regularized", "EM"),
coeff.ridge = 1,
row.w = NULL,
threshold = 1e-06,
seed = NULL,
nb.init = 1,
maxiter = 1000,
miniter = 5,
solver = c("auto", "exact", "lobpcg"),
lobpcg_control = NULL,
colmax = 0.9,
post_imp = TRUE,
na_check = TRUE,
clamp = NULL,
.progress = FALSE
)
obj |
A numeric matrix with samples in rows and features in columns. |
ncp |
Integer. Number of principal components used to predict missing entries. |
scale |
Logical. If |
method |
Character. PCA imputation method: either |
coeff.ridge |
Numeric. Ridge regularization, used only when
|
row.w |
Row weights, normalized to sum to |
threshold |
Numeric. Convergence threshold. |
seed |
Integer, numeric, or |
nb.init |
Integer. Number of random initializations. The first initialization is always mean imputation. |
maxiter |
Integer. Maximum number of iterations. |
miniter |
Integer. Minimum number of iterations. |
solver |
Character. Eigensolver: |
lobpcg_control |
A list of LOBPCG eigensolver control options, usually
created by |
colmax |
Numeric scalar between |
post_imp |
Logical. If |
na_check |
Logical. If |
clamp |
Optional numeric vector |
.progress |
Logical. If |
This function reimplements the PCA imputation method from the missMDA
package by Francois Husson and Julie Josse, based on Josse and Husson (2016).
A numeric matrix of the same dimensions as obj, with missing
values imputed. The returned object has class slideimp_results.
Speed comes from three levers: solver (through LOBPCG with warm-start),
threshold, and scale. Tune these first, then accuracy parameters
(ncp, coeff.ridge) on a representative subset.
Exact vs. LOBPCG with warm-start. Whether "lobpcg" beats "exact"
depends on size and low-rankness: "lobpcg" is preferred for large, approximately
low-rank matrices with small ncp, and "exact" for small matrices
(including slide_imp() windows), where it is faster and more robust.
Separately, the warm-start makes each successive solve cheap: pca_imp()
warm-starts LOBPCG with the previous eigenblock and search direction, so once
imputed values stabilize, later solves converge in a few iterations. The
payoff therefore grows with the number of EM iterations, independent of
low-rankness. solver = "auto" (default) probes both and is a safe start.
Threshold. The default 1e-6 is conservative; 1e-5 is often faster
with very similar values.
Scale. For columns on a common scale (e.g., DNAm beta values in
[0, 1]), scale = FALSE can be faster and more accurate.
Parallel and BLAS. In parallel via tune_imp() or group_imp() with a
multithreaded BLAS, set pin_blas = TRUE to avoid thread oversubscription.
On Windows, the stock BLAS can be slow. Advanced users can swap in
OpenBLAS.
See Speeding up PCA imputation for the full workflow.
Josse J, Husson F (2013). Handling missing values in exploratory multivariate data analysis methods. Journal de la SFdS, 153(2), 79-99.
Josse J, Husson F (2016). missMDA: A Package for Handling Missing Values in Multivariate Data Analysis. Journal of Statistical Software, 70(1), 1-31. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v070.i01")}
set.seed(123)
obj <- sim_mat(10, 10)$input
sum(is.na(obj))
obj[1:4, 1:4]
# Randomly initialize missing values 5 times. The first initialization is
# mean imputation. Select `ncp` with `tune_imp()`.
pca_imp(obj, ncp = 2, nb.init = 5, seed = 123)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.