impute_batches: Impute batches and return completed data frame

View source: R/hdImpute.R

impute_batchesR Documentation

Impute batches and return completed data frame

Description

Impute batches and return completed data frame

Usage

impute_batches(data, features, batch, pmm_k, n_trees, seed, save)

Arguments

data

Original data frame or tibble (with missing values)

features

Correlation-based vector of ranked features output from running flatten_mat()

batch

Numeric. Batch size.

pmm_k

Integer. Number of neighbors considered in imputation. Default at 5.

n_trees

Integer. Number of trees used in imputation. Default at 15.

seed

Integer. Seed to be set for reproducibility.

save

Should the list of individual imputed batches be saved as .rds file to working directory? Default set to FALSE.

Details

Step 1. group data by dividing the row_number() by batch size (batch, number of batches set by user) using integer division. Step 2. pass through group_split() to return a list. Step 3. impute each batch individually and time. Step 4. generate completed (unlisted/joined) imputed data frame

Value

A completed, imputed data set

References

Waggoner, P. D. (2023). A batch process for high dimensional imputation. Computational Statistics, 1-22. doi: <10.1007/s00180-023-01325-9>

Stekhoven, D. J., & Bühlmann, P. (2012). MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112-118. doi: <10.1093/bioinformatics/btr597>

Examples

## Not run: 
impute_batches(data = data, features = flat_mat,
batch = 2,  pmm_k = 5, n_trees = 15, seed = 123,
save = FALSE)

## End(Not run)

pdwaggoner/hdImpute documentation built on Sept. 2, 2024, 6:41 a.m.