imputate_outlier: Impute Outliers

View source: R/imputation.R

imputate_outlierR Documentation

Impute Outliers

Description

Outliers are imputed with some representative values and statistical methods.

Usage

imputate_outlier(.data, xvar, method, no_attrs, cap_ntiles)

Arguments

.data

a data.frame or a tbl_df.

xvar

variable name to replace missing value.

method

method of missing values imputation.

no_attrs

logical. If TRUE, return numerical variable or categorical variable. else If FALSE, imputation class.

cap_ntiles

numeric. Only used when method is "capping". Specifies the value of percentiles replaced by the values of lower outliers and upper outliers. The default is c(0.05, 0.95).

Details

imputate_outlier() creates an imputation class. The 'imputation' class includes missing value position, imputed value, and method of missing value imputation, etc. The 'imputation' class compares the imputed value with the original value to help determine whether the imputed value is used in the analysis.

See vignette("transformation") for an introduction to these concepts.

Value

An object of imputation class. or numerical variable. if no_attrs is FALSE then return imputation class, else no_attrs is TRUE then return numerical vector. Attributes of imputation class is as follows.

  • method : method of missing value imputation.

    • predictor is numerical variable

      • "mean" : arithmetic mean

      • "median" : median

      • "mode" : mode

      • "capping" : Impute the upper outliers with 95 percentile, and Impute the lower outliers with 5 percentile.

        • You can change this criterion with the cap_ntiles argument.

  • outlier_pos : position of outliers in predictor.

  • outliers : outliers. outliers corresponding to outlier_pos.

  • type : "outliers". type of imputation.

See Also

imputate_na.

Examples


# Replace the outliers of the sodium variable with median.
imputate_outlier(heartfailure, sodium, method = "median")

# Replace the outliers of the sodium variable with capping.
imputate_outlier(heartfailure, sodium, method = "capping")
imputate_outlier(heartfailure, sodium, method = "capping", 
                 cap_ntiles = c(0.1, 0.9))

## using dplyr -------------------------------------
library(dplyr)

# The mean before and after the imputation of the sodium variable
heartfailure %>%
  mutate(sodium_imp = imputate_outlier(heartfailure, sodium, 
                                      method = "capping", no_attrs = TRUE)) %>%
  group_by(death_event) %>%
  summarise(orig = mean(sodium, na.rm = TRUE),
            imputation = mean(sodium_imp, na.rm = TRUE))
            
# If the variable of interest is a numerical variables
sodium <- imputate_outlier(heartfailure, sodium)
sodium
summary(sodium)

plot(sodium)



choonghyunryu/dlookr documentation built on June 11, 2024, 9:12 a.m.