hdImpute | R Documentation |
Complete hdImpute process: correlation matrix, flatten, rank, create batches, impute, join
hdImpute(data, batch, pmm_k, n_trees, seed, save)
data |
Original data frame or tibble (with missing values) |
batch |
Numeric. Batch size. |
pmm_k |
Integer. Number of neighbors considered in imputation. Default set at 5. |
n_trees |
Integer. Number of trees used in imputation. Default set at 15. |
seed |
Integer. Seed to be set for reproducibility. |
save |
Should the list of individual imputed batches be saved as .rds file to working directory? Default set to FALSE. |
Step 1. group data by dividing the row_number()
by batch size (batch
, number of batches set by user) using integer division. Step 2. pass through group_split()
to return a list. Step 3. impute each batch individually and time. Step 4. generate completed (unlisted/joined) imputed data frame
A completed, imputed data set
Waggoner, P. D. (2023). A batch process for high dimensional imputation. Computational Statistics, 1-22. doi: <10.1007/s00180-023-01325-9>
Stekhoven, D. J., & Bühlmann, P. (2012). MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112-118. doi: <10.1093/bioinformatics/btr597>
## Not run:
impute_batches(data = data,
batch = 2, pmm_k = 5, n_trees = 15,
seed = 123, save = FALSE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.