imputate_na: Impute Missing Values
In dlookr: Tools for Data Diagnosis, Exploration, Transformation

imputate_na

R Documentation

Impute Missing Values

Description

Missing values are imputed with some representative values and statistical methods.

Usage

imputate_na(.data, xvar, yvar, method, seed, print_flag, no_attrs)

Arguments

`.data`	a data.frame or a `tbl_df`.
`xvar`	variable name to replace missing value.
`yvar`	target variable.
`method`	method of missing values imputation.
`seed`	integer. the random seed used in mice. only used "mice" method.
`print_flag`	logical. If TRUE, mice will print running log on console. Use print_flag=FALSE for silent computation. Used only when method is "mice".
`no_attrs`	logical. If TRUE, return numerical variable or categorical variable. else If FALSE, imputation class.

Details

imputate_na() creates an imputation class. The 'imputation' class includes missing value position, imputed value, and method of missing value imputation, etc. The 'imputation' class compares the imputed value with the original value to help determine whether the imputed value is used in the analysis.

See vignette("transformation") for an introduction to these concepts.

Value

An object of imputation class. or numerical variable or categorical variable. if no_attrs is FALSE then return imputation class, else no_attrs is TRUE then return numerical vector or factor. Attributes of imputation class is as follows.

var_type : the data type of predictor to replace missing value.
method : method of missing value imputation.
- predictor is numerical variable.
  - "mean" : arithmetic mean.
  - "median" : median.
  - "mode" : mode.
  - "knn" : K-nearest neighbors.
  - "rpart" : Recursive Partitioning and Regression Trees.
  - "mice" : Multivariate Imputation by Chained Equations.
- predictor is categorical variable.
  - "mode" : mode.
  - "rpart" : Recursive Partitioning and Regression Trees.
  - "mice" : Multivariate Imputation by Chained Equations.
na_pos : position of missing value in predictor.
seed : the random seed used in mice. only used "mice" method.
type : "missing values". type of imputation.
message : a message tells you if the result was successful.
success : Whether the imputation was successful.

Examples

# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 20), "platelets"] <- NA
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "smoking"] <- NA

# Replace the missing value of the platelets variable with median
imputate_na(heartfailure2, platelets, method = "median")

# Replace the missing value of the platelets variable with rpart
# The target variable is death_event.
# Require rpart package
imputate_na(heartfailure2, platelets, death_event, method = "rpart")

# Replace the missing value of the smoking variable with mode
imputate_na(heartfailure2, smoking, method = "mode")

## using dplyr -------------------------------------
library(dplyr)

# The mean before and after the imputation of the platelets variable
heartfailure2 %>%
  mutate(platelets_imp = imputate_na(heartfailure2, platelets, death_event, 
                                     method = "knn", no_attrs = TRUE)) %>%
  group_by(death_event) %>%
  summarise(orig = mean(platelets, na.rm = TRUE),
            imputation = mean(platelets_imp))

# If the variable of interest is a numerical variable
# Require rpart package
platelets <- imputate_na(heartfailure2, platelets, death_event, method = "rpart")
platelets

dlookr documentation built on May 29, 2024, 2 a.m.