imputate_na: Impute Missing Values

View source: R/imputation.R

imputate_naR Documentation

Impute Missing Values

Description

Missing values are imputed with some representative values and statistical methods.

Usage

imputate_na(.data, xvar, yvar, method, seed, print_flag, no_attrs)

Arguments

.data

a data.frame or a tbl_df.

xvar

variable name to replace missing value.

yvar

target variable.

method

method of missing values imputation.

seed

integer. the random seed used in mice. only used "mice" method.

print_flag

logical. If TRUE, mice will print running log on console. Use print_flag=FALSE for silent computation. Used only when method is "mice".

no_attrs

logical. If TRUE, return numerical variable or categorical variable. else If FALSE, imputation class.

Details

imputate_na() creates an imputation class. The 'imputation' class includes missing value position, imputed value, and method of missing value imputation, etc. The 'imputation' class compares the imputed value with the original value to help determine whether the imputed value is used in the analysis.

See vignette("transformation") for an introduction to these concepts.

Value

An object of imputation class. or numerical variable or categorical variable. if no_attrs is FALSE then return imputation class, else no_attrs is TRUE then return numerical vector or factor. Attributes of imputation class is as follows.

  • var_type : the data type of predictor to replace missing value.

  • method : method of missing value imputation.

    • predictor is numerical variable.

      • "mean" : arithmetic mean.

      • "median" : median.

      • "mode" : mode.

      • "knn" : K-nearest neighbors.

      • "rpart" : Recursive Partitioning and Regression Trees.

      • "mice" : Multivariate Imputation by Chained Equations.

    • predictor is categorical variable.

      • "mode" : mode.

      • "rpart" : Recursive Partitioning and Regression Trees.

      • "mice" : Multivariate Imputation by Chained Equations.

  • na_pos : position of missing value in predictor.

  • seed : the random seed used in mice. only used "mice" method.

  • type : "missing values". type of imputation.

  • message : a message tells you if the result was successful.

  • success : Whether the imputation was successful.

See Also

imputate_outlier.

Examples


# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 20), "platelets"] <- NA
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "smoking"] <- NA

# Replace the missing value of the platelets variable with median
imputate_na(heartfailure2, platelets, method = "median")

# Replace the missing value of the platelets variable with rpart
# The target variable is death_event.
# imputate_na(heartfailure2, platelets, death_event, method = "rpart")

# Replace the missing value of the smoking variable with mode
# imputate_na(heartfailure2, smoking, method = "mode")

# Replace the missing value of the smoking variable with mice
# The target variable is death_event.
# imputate_na(heartfailure2, smoking, death_event, method = "mice")

## using dplyr -------------------------------------
library(dplyr)

# The mean before and after the imputation of the platelets variable
heartfailure2 %>%
  mutate(platelets_imp = imputate_na(heartfailure2, platelets, death_event, 
                                     method = "knn", no_attrs = TRUE)) %>%
  group_by(death_event) %>%
  summarise(orig = mean(platelets, na.rm = TRUE),
            imputation = mean(platelets_imp))

# If the variable of interest is a numerical variable
platelets <- imputate_na(heartfailure2, platelets, death_event, method = "rpart")
platelets
summary(platelets)

# plot(platelets)

# If the variable of interest is a categorical variable
# smoking <- imputate_na(heartfailure2, smoking, death_event, method = "mice")
# smoking
# summary(smoking)

# plot(smoking)



dlookr documentation built on July 9, 2023, 6:31 p.m.