| slide_imp | R Documentation |
Performs sliding window K-NN or PCA imputation of large numeric matrices column-wise. This
method assumes that columns are meaningfully sorted by location.
slide_imp(
obj,
location,
window_size,
overlap_size = 0,
flank = FALSE,
min_window_n,
subset = NULL,
dry_run = FALSE,
k = NULL,
cores = 1,
dist_pow = 0,
max_cache = 4,
ncp = NULL,
scale = TRUE,
coeff.ridge = 1,
seed = NULL,
row.w = NULL,
nb.init = 1,
maxiter = 1000,
miniter = 5,
method = NULL,
.progress = TRUE,
colmax = 0.9,
post_imp = TRUE,
na_check = TRUE,
on_infeasible = c("skip", "error", "mean")
)
obj |
A numeric matrix with samples in rows and features in columns. |
location |
A sorted numeric vector of length |
window_size |
Window width in the same units as |
overlap_size |
Overlap between consecutive windows in the same units
as |
flank |
Logical. If |
min_window_n |
Minimum number of columns a window must contain to be
imputed. Windows smaller than this are not imputed. |
subset |
Character. Vector of column names or integer vector of column indices specifying which columns to impute. |
dry_run |
Logical. If |
k |
Integer. Number of nearest neighbors for imputation. 10 is a good starting point. |
cores |
Integer. Number of cores for K-NN parallelization (OpenMP). On macOS, OpenMP may need additional compiler configuration. |
dist_pow |
Numeric. The amount of penalization for further away nearest
neighbors in the weighted average. |
max_cache |
Numeric. Maximum allowed cache size in GB (default |
ncp |
Integer. Number of components used to predict the missing entries. |
scale |
Logical. If |
coeff.ridge |
Numeric. Ridge regularization coefficient (default is 1).
Only used if |
seed |
Integer. Random number generator seed. |
row.w |
Row weights (internally normalized to sum to 1). Can be one of:
|
nb.init |
Integer. Number of random initializations. The first initialization is always mean imputation. |
maxiter |
Integer. Maximum number of iterations for the algorithm. |
miniter |
Integer. Minimum number of iterations for the algorithm. |
method |
For K-NN imputation: distance metric to use ( |
.progress |
Show progress bar (default = |
colmax |
Numeric. A number from 0 to 1. Threshold of column-wise missing data rate above which imputation is skipped. |
post_imp |
Boolean. Whether to impute remaining missing values (those that failed imputation) using column means. |
na_check |
Boolean. Check for leftover |
on_infeasible |
Character, one of |
The sliding window approach divides the input matrix into smaller segments
based on location values and applies imputation to each window
independently. Values in overlapping areas are averaged across windows to
produce the final imputed result.
Two windowing modes are supported:
flank = FALSE (default): Greedily partitions the
location vector into windows of width window_size with the requested
overlap_size between consecutive windows.
flank = TRUE: Creates one window per feature
in subset that exactly flanks that specific feature using the supplied
window_size.
Specify k and related arguments to use knn_imp(), ncp and related
arguments for pca_imp().
A numeric matrix of the same dimensions as obj with missing values
imputed. When dry_run = TRUE, returns a data.frame of class slideimp_tbl
with columns start, end, window_n, plus subset_local (and target
when flank = TRUE).
# Generate sample data with missing values with 20 samples and 100 columns
# where the column order is sorted (i.e., by genomic position)
set.seed(1234)
beta_matrix <- sim_mat(20, 100)$input
location <- 1:100
# It's very useful to first perform a dry run to examine the calculated windows
windows_statistics <- slide_imp(
beta_matrix,
location = location,
window_size = 50,
overlap_size = 10,
min_window_n = 10,
dry_run = TRUE
)
windows_statistics
# Sliding Window K-NN imputation by specifying `k` (sliding windows)
imputed_knn <- slide_imp(
beta_matrix,
location = location,
k = 5,
window_size = 50,
overlap_size = 10,
min_window_n = 10,
scale = FALSE # This argument belongs to PCA imputation and will be ignored
)
imputed_knn
# Sliding Window PCA imputation by specifying `ncp` (sliding windows)
pca_knn <- slide_imp(
beta_matrix,
location = location,
ncp = 2,
window_size = 50,
overlap_size = 10,
min_window_n = 10
)
pca_knn
# Sliding Window K-NN imputation with flanking windows (flank = TRUE)
# Only the columns listed in `subset` are imputed; each uses its own
# centered window of width `window_size`.
imputed_flank <- slide_imp(
beta_matrix,
location = location,
k = 2,
window_size = 30,
flank = TRUE,
subset = c(10, 30, 70),
min_window_n = 5
)
imputed_flank
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.