Author: Maciej Nasinski
Check the miceFast website for more details
miceFast provides fast methods for imputing missing data, leveraging an object-oriented programming paradigm and optimized linear algebra routines. The package includes convenient helper functions compatible with data.table, dplyr, and other popular R packages.
Major speed improvements occur when: - Using a grouping variable, where the data is automatically sorted by group, significantly reducing computation time. - Performing multiple imputations, by evaluating the underlying quantitative model only once for multiple draws. - Running Predictive Mean Matching (PMM), thanks to presorting and binary search.
For performance details, see performance_validity.R
in the extdata
folder.
It is recommended to read the Advanced Usage Vignette.
You can install miceFast from CRAN:
install.packages("miceFast")
Or install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("polkas/miceFast")
Below is a short demonstration. See the vignette for advanced usage and best practices.
library(miceFast)
set.seed(1234)
data(air_miss)
# Visualize the NA structure
upset_NA(air_miss, 6)
# Simple and naive fill
imputed_data <- naive_fill_NA(air_miss)
# Compare with other packages:
# Hmisc
library(Hmisc)
data.frame(Map(function(x) Hmisc::impute(x, "random"), air_miss))
# mice
library(mice)
mice::complete(mice::mice(air_miss, printFlag = FALSE))
miceFast
objects (Rcpp modules).fill_NA()
: Single imputation (lda
, lm_pred
, lm_bayes
, lm_noise
). fill_NA_N()
: Multiple imputations (pmm
, lm_bayes
, lm_noise
). VIF()
: Variance Inflation Factor calculations. naive_fill_NA()
: Automatic naive imputations. compare_imp()
: Compare original vs. imputed values. upset_NA()
: Visualize NA structure using UpSetR.Quick Reference Table:
| Function | Description |
|-----------------|-----------------------------------------------------------------------------|
| new(miceFast)
| Creates an OOP instance with numerous imputation methods (see the vignette). |
| fill_NA()
| Single imputation: lda
, lm_pred
, lm_bayes
, lm_noise
. |
| fill_NA_N()
| Multiple imputations (N repeats): pmm
, lm_bayes
, lm_noise
. |
| VIF()
| Computes Variance Inflation Factors. |
| naive_fill_NA()
| Performs automatic, naive imputations. |
| compare_imp()
| Compares imputations vs. original data. |
| upset_NA()
| Visualizes NA structure using an UpSet plot. |
Benchmark testing (on R 4.2, macOS M1) shows miceFast can significantly reduce computation time, especially in these scenarios:
x * (number of multiple imputations)
faster, since the model is computed only once. For performance details, see performance_validity.R
in the extdata
folder.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.