| group_imp | R Documentation |
Perform K-NN or PCA imputation independently on feature groups (e.g., by chromosomes, flanking probes, or clustering-based groups).
group_imp(
obj,
group,
subset = NULL,
allow_unmapped = FALSE,
k = NULL,
ncp = NULL,
method = NULL,
cores = 1,
.progress = TRUE,
min_group_size = NULL,
colmax = NULL,
post_imp = NULL,
dist_pow = NULL,
tree = NULL,
max_cache = NULL,
scale = NULL,
coeff.ridge = NULL,
threshold = NULL,
row.w = NULL,
seed = NULL,
nb.init = NULL,
maxiter = NULL,
miniter = NULL,
pin_blas = FALSE,
na_check = TRUE,
on_infeasible = c("error", "skip", "mean")
)
obj |
A numeric matrix with samples in rows and features in columns. |
group |
Specification of how features should be grouped for imputation. Accepts three formats:
|
subset |
Character vector of feature names to impute (default
|
allow_unmapped |
Logical. If |
k |
Integer. Number of nearest neighbors for imputation. 10 is a good starting point. |
ncp |
Integer. Number of components used to predict the missing entries. |
method |
For K-NN imputation: distance metric to use ( |
cores |
The number of OpenMP cores for K-NN imputation
only. For PCA or mirai-based parallelism, use
|
.progress |
Show imputation progress (default |
min_group_size |
Integer or |
colmax |
Numeric. A number from 0 to 1. Threshold of column-wise missing data rate above which imputation is skipped. |
post_imp |
Boolean. Whether to impute remaining missing values (those that failed imputation) using column means. |
dist_pow |
Numeric. The amount of penalization for further away nearest
neighbors in the weighted average. |
tree |
Logical. |
max_cache |
Numeric. Maximum allowed cache size in GB (default |
scale |
Logical. If |
coeff.ridge |
Numeric. Ridge regularization coefficient (default is 1).
Only used if |
threshold |
Numeric. The threshold for assessing convergence. |
row.w |
Row weights (internally normalized to sum to 1). Can be one of:
|
seed |
Numeric or |
nb.init |
Integer. Number of random initializations. The first initialization is always mean imputation. |
maxiter |
Integer. Maximum number of iterations for the algorithm. |
miniter |
Integer. Minimum number of iterations for the algorithm. |
pin_blas |
Logical. If |
na_check |
Boolean. Check for leftover |
on_infeasible |
Character, one of |
Performs K-NN or PCA imputation on groups of features independently, which significantly reduces imputation time for large datasets.
Specify k and related arguments to use K-NN, or ncp and related
arguments for PCA imputation. If both k and ncp are NULL,
group$parameters must supply either k or ncp for every group.
Group-wise parameters (in group$parameters) take priority; global
arguments (k, ncp, method, etc.) fill in any gaps. All groups
must use the same imputation method. Per-group k is capped at
group_size - 1 and ncp at min(nrow(group) - 2L, ncol(group) - 1L), with a warning when capping occurs.
Chromosomal grouping to break down the search space.
Flanking-probe groups for spatially local imputation.
Column-clustering to form correlation-based groups.
A numeric matrix of the same dimensions as obj with missing
values imputed.
K-NN: use the cores argument (requires OpenMP). If
mirai::daemons() are active, cores is automatically set to 1
to avoid nested parallelism.
PCA: use mirai::daemons() instead of cores.
On macOS, OpenMP is typically unavailable and cores falls back to
Use mirai::daemons() for parallelization instead.
On Linux with OpenBLAS or MKL, set pin_blas = TRUE when running
parallel PCA to prevent BLAS threads and mirai workers competing
for cores.
A character string can be passed to group to name a supported
Illumina platform (e.g., "EPICv2", "EPICv2_deduped"), which
fetches the manifest automatically. This requires the
slideimp.extra package (available on GitHub; see its README for
installation instructions). Supported platforms are listed in
slideimp.extra::slideimp_arrays.
prep_groups()
# Generate example data with missing values
set.seed(1234)
to_test <- sim_mat(10, 20, perc_total_na = 0.05, perc_col_na = 1)
obj <- to_test$input
group <- to_test$col_group # metadata that maps `colnames(obj)` to groups
head(group)
# Simple grouped K-NN imputation
results <- group_imp(obj, group = group, k = 2)
# Impute only a subset of features
subset_features <- sample(to_test$col_group$feature, size = 10)
knn_subset <- group_imp(obj, group = group, subset = subset_features, k = 2)
# Use prep_groups() to inspect and tweak per-group parameters
prepped <- prep_groups(colnames(obj), group)
prepped$parameters <- lapply(seq_len(nrow(prepped)), \(i) list(k = 2))
prepped$parameters[[2]]$k <- 4
knn_grouped <- group_imp(obj, group = prepped, cores = 2)
# PCA imputation with mirai parallelism
mirai::daemons(2)
pca_grouped <- group_imp(obj, group = group, ncp = 2)
mirai::daemons(0)
pca_grouped
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.