group_imp() enforces stricter data validation. The requested feature subset
must be a subset the object's column names which must be a subset of the mapping
data.frame. Set allow_unmapped = TRUE to bypass errors when intersections are
incomplete.
group_imp() and tune_imp() now error when arguments are supplied that do
not apply to the chosen imputation method, rather than silently ignoring them.
knn_imp() now uses a logical tree argument to toggle between Ball tree
(TRUE) and brute force (FALSE). KD tree is no longer supported.
knn_imp() and pca_imp() gain more early errors and early exits.
pca_imp() gains the same colmax and post_imp arguments as knn_imp().
prep_groups() (formerly group_features()) is the new name for the grouping
function. It now accepts a column name vector instead of a full matrix.
sample_na_loc() (formerly inject_na()) is now exported. The original
remains accessible via slideimp:::inject_na() for legacy code.
sim_mat() now returns a matrix in sample-by-column format for immediate
compatibility with other package functions. perc_NA is renamed to
perc_total_na, and dimensions are now specified via n (rows) and p (columns).
tune_imp() gains a unified method argument that applies to both
pca_imp() and knn_imp(), replacing pca_method and knn_method.
The rep argument is renamed to n_reps.
tune_imp() results from v0.5.4 are no longer reproducible because internal
NA generation now uses sample_na_loc().
The khanmiss1 dataset has been removed.
compute_metrics() now supports data frames with a result list column
containing truth and estimate columns, similar to {yardstick}.
group_imp() and prep_groups() automatically look up Illumina manifests
using the register-on-load pattern for {slideimp.extra}.
knn_imp() gains max_cache to control the internal cache size
(defaults to 4GB).
sim_mat() gains a rho argument to support compound symmetry correlation
structures in simulated matrices.
sim_mat() and tune_imp() gain dedicated print methods that provide concise
summaries instead of dumping raw data to the console.
slide_imp() gains location, flank, and dry_run arguments for
fixed-window imputation, "flank mode" for features surrounding a subset, and
pre-computation inspection of window statistics.
tune_imp() gains granular control over NA injection via n_cols, n_rows,
num_na, and na_col_subset. Pre-calculated locations can also be passed to
na_loc to compare methods using identical NA patterns.
col_vars() and mean_imp_col() have been overhauled to use the faster
{RcppArmadillo} backend and now support parallel computation with OpenMP.
Dependencies are streamlined. {tibble} and {purrr} are removed as hard
dependencies, {cli} is added for more informative messaging, and {carrier}
is added as an explicit dependency.
Documentation is thoroughly overhauled with numerous consistency improvements and bug fixes.
{RhpcBLASctl} is added as a suggested package to allow pinning BLAS cores
and avoid thrashing during parallel runs.
group_imp() and tune_imp() prioritize process-level parallelization via
{mirai}. knn_imp() supports OpenMP-controlled parallelization via the cores
argument when {mirai} daemons are not active.
knn_imp() and pca_imp() use optimized internal Rcpp functions for better
performance.
CRAN resubmission.
group_features() is added to help with creating the group tibble needed for
group_imp().
pca_imp() now allows row.w = "n_miss" to scale row weights by the number
of missing values per row.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.