| tune_imp | R Documentation |
Tunes hyperparameters for imputation methods such as slide_imp(),
knn_imp(), pca_imp(), or user-supplied custom functions by repeated
cross-validation. For group_imp(), tune knn_imp() or pca_imp() on
a single group.
tune_imp(
obj,
parameters = NULL,
.f,
na_loc = NULL,
num_na = NULL,
n_reps = 1,
n_cols = NULL,
n_rows = 2,
rowmax = 0.9,
colmax = 0.9,
na_col_subset = NULL,
max_attempts = 100,
.progress = TRUE,
cores = 1,
location = NULL,
pin_blas = FALSE
)
obj |
A numeric matrix with samples in rows and features in columns. |
parameters |
A data.frame specifying parameter combinations to tune,
where each column represents a parameter accepted by |
.f |
Either |
na_loc |
Optional. Pre-defined missing value locations to bypass
random NA injection with
|
num_na |
Integer. Total number of missing values to inject per
repetition. If supplied, |
n_reps |
Integer. Number of repetitions for random NA injection
(default |
n_cols |
Integer. The number of columns to receive injected |
n_rows |
Integer. The target number of
|
rowmax, colmax |
Numbers between 0 and 1. NA injection cannot create rows/columns with a higher proportion of missing values than these thresholds. |
na_col_subset |
Optional integer or character vector restricting which
columns of
|
max_attempts |
Integer. Maximum number of resampling attempts per
repetition before giving up due to row-budget exhaustion (default |
.progress |
Logical. Show a progress bar during tuning
(default |
cores |
Controls the number of cores to parallelize over for K-NN
and sliding-window K-NN imputation with OpenMP. For other methods, use
|
location |
Required only for |
pin_blas |
Logical. Pin BLAS threads to 1 during parallel tuning
(default |
The function supports tuning for built-in methods ("slide_imp",
"knn_imp", "pca_imp") or custom functions provided via .f.
When .f is a character string, the columns in parameters are validated
against the chosen method's requirements:
"knn_imp": requires k in parameters
"pca_imp": requires ncp in parameters
"slide_imp": requires window_size, overlap_size, and
min_window_n, plus exactly one of k or ncp
When .f is a custom function, the columns in parameters must
correspond to the arguments of .f (excluding the obj argument). The
custom function must accept obj (a numeric matrix) as its first argument
and return a numeric matrix of identical dimensions.
Tuning results can be evaluated using the yardstick package or
compute_metrics().
A data.frame of class slideimp_tune containing:
...: All columns originally provided in parameters.
param_set: An integer ID for the unique parameter combination.
rep_id: An integer indicating the repetition index.
result: A nested list-column where each element is a data.frame
containing truth (original values) and estimate (imputed values).
error: A character column containing the error message if the
iteration failed, otherwise NA.
K-NN: use the cores argument (requires OpenMP). If
mirai::daemons() are active, cores is automatically set to 1
to avoid nested parallelism.
PCA: use mirai::daemons() instead of cores.
On macOS, OpenMP is typically unavailable and cores falls back to
Use mirai::daemons() for parallelization instead.
On Linux with OpenBLAS or MKL, set pin_blas = TRUE when running
parallel PCA to prevent BLAS threads and mirai workers competing
for cores.
# Setup example data. Increase `num_na` (500) and `n_reps` (10-30) in real
# analyses
obj <- sim_mat(10, 50)$input
# 1. Tune K-NN imputation with random NA injection
params_knn <- data.frame(k = c(2, 4))
results <- tune_imp(obj, params_knn, .f = "knn_imp", n_reps = 1, num_na = 10)
compute_metrics(results)
# 2. Tune with fixed NA positions
na_positions <- list(
matrix(c(1, 2, 3, 1, 1, 1), ncol = 2),
matrix(c(2, 3, 4, 2, 2, 2), ncol = 2)
)
results_fixed <- tune_imp(
obj,
data.frame(k = 2),
.f = "knn_imp",
na_loc = na_positions
)
# 3. Custom imputation function
custom_fill <- function(obj, val = 0) {
obj[is.na(obj)] <- val
obj
}
tune_imp(obj, data.frame(val = c(0, 1)), .f = custom_fill, num_na = 10)
# 4. Parallel tuning (requires mirai package)
mirai::daemons(2)
parameters_custom <- data.frame(mean = c(0, 1), sd = c(1, 1))
# Define a simple custom function for illustration
custom_imp <- function(obj, mean, sd) {
na_pos <- is.na(obj)
obj[na_pos] <- stats::rnorm(sum(na_pos), mean = mean, sd = sd)
obj
}
results_p <- tune_imp(
obj, parameters_custom, .f = custom_imp, n_reps = 1, num_na = 10
)
mirai::daemons(0) # Close workers
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.