hdImpute: Complete hdImpute process: correlation matrix, flatten, rank,...

View source: R/hdImpute.R

hdImputeR Documentation

Complete hdImpute process: correlation matrix, flatten, rank, create batches, impute, join

Description

Complete hdImpute process: correlation matrix, flatten, rank, create batches, impute, join

Usage

hdImpute(data, batch, pmm_k, n_trees, seed, save)

Arguments

data

Original data frame or tibble (with missing values)

batch

Numeric. Batch size.

pmm_k

Integer. Number of neighbors considered in imputation. Default set at 5.

n_trees

Integer. Number of trees used in imputation. Default set at 15.

seed

Integer. Seed to be set for reproducibility.

save

Should the list of individual imputed batches be saved as .rds file to working directory? Default set to FALSE.

Details

Step 1. group data by dividing the row_number() by batch size (batch, number of batches set by user) using integer division. Step 2. pass through group_split() to return a list. Step 3. impute each batch individually and time. Step 4. generate completed (unlisted/joined) imputed data frame

Value

A completed, imputed data set

References

Waggoner, P. D. (2023). A batch process for high dimensional imputation. Computational Statistics, 1-22. doi: <10.1007/s00180-023-01325-9>

Stekhoven, D. J., & Bühlmann, P. (2012). MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112-118. doi: <10.1093/bioinformatics/btr597>

Examples

## Not run: 
impute_batches(data = data,
batch = 2,  pmm_k = 5, n_trees = 15,
seed = 123, save = FALSE)

## End(Not run)

pdwaggoner/hdImpute documentation built on Sept. 2, 2024, 6:41 a.m.