README.md

miceFast

Author: Maciej Nasinski

Check the miceFast website for more details

R build status CRAN codecov Dependencies

Overview

miceFast provides fast methods for imputing missing data, leveraging an object-oriented programming paradigm and optimized linear algebra routines. The package includes convenient helper functions compatible with data.table, dplyr, and other popular R packages.

Major speed improvements occur when: - Using a grouping variable, where the data is automatically sorted by group, significantly reducing computation time. - Performing multiple imputations, by evaluating the underlying quantitative model only once for multiple draws. - Running Predictive Mean Matching (PMM), thanks to presorting and binary search.

For performance details, see performance_validity.R in the extdata folder.

It is recommended to read the Advanced Usage Vignette.

Installation

You can install miceFast from CRAN:

install.packages("miceFast")

Or install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("polkas/miceFast")

Quick Example

Below is a short demonstration. See the vignette for advanced usage and best practices.

library(miceFast)

set.seed(1234)
data(air_miss)

# Visualize the NA structure
upset_NA(air_miss, 6)

# Simple and naive fill
imputed_data <- naive_fill_NA(air_miss)

# Compare with other packages:
# Hmisc
library(Hmisc)
data.frame(Map(function(x) Hmisc::impute(x, "random"), air_miss))

# mice
library(mice)
mice::complete(mice::mice(air_miss, printFlag = FALSE))

Key Features

Quick Reference Table:

| Function | Description | |-----------------|-----------------------------------------------------------------------------| | new(miceFast) | Creates an OOP instance with numerous imputation methods (see the vignette). | | fill_NA() | Single imputation: lda, lm_pred, lm_bayes, lm_noise. | | fill_NA_N() | Multiple imputations (N repeats): pmm, lm_bayes, lm_noise. | | VIF() | Computes Variance Inflation Factors. | | naive_fill_NA() | Performs automatic, naive imputations. | | compare_imp() | Compares imputations vs. original data. | | upset_NA() | Visualizes NA structure using an UpSet plot. |

Performance Highlights

Benchmark testing (on R 4.2, macOS M1) shows miceFast can significantly reduce computation time, especially in these scenarios:

For performance details, see performance_validity.R in the extdata folder.



Try the miceFast package in your browser

Any scripts or data that you put into this service are public.

miceFast documentation built on April 4, 2025, 2:35 a.m.