imputate_outlier | R Documentation |
Outliers are imputed with some representative values and statistical methods.
imputate_outlier(.data, xvar, method, no_attrs, cap_ntiles)
.data |
a data.frame or a |
xvar |
variable name to replace missing value. |
method |
method of missing values imputation. |
no_attrs |
logical. If TRUE, return numerical variable or categorical variable. else If FALSE, imputation class. |
cap_ntiles |
numeric. Only used when method is "capping". Specifies the value of percentiles replaced by the values of lower outliers and upper outliers. The default is c(0.05, 0.95). |
imputate_outlier() creates an imputation class. The 'imputation' class includes missing value position, imputed value, and method of missing value imputation, etc. The 'imputation' class compares the imputed value with the original value to help determine whether the imputed value is used in the analysis.
See vignette("transformation") for an introduction to these concepts.
An object of imputation class. or numerical variable. if no_attrs is FALSE then return imputation class, else no_attrs is TRUE then return numerical vector. Attributes of imputation class is as follows.
method : method of missing value imputation.
predictor is numerical variable
"mean" : arithmetic mean
"median" : median
"mode" : mode
"capping" : Impute the upper outliers with 95 percentile, and Impute the lower outliers with 5 percentile.
You can change this criterion with the cap_ntiles argument.
outlier_pos : position of outliers in predictor.
outliers : outliers. outliers corresponding to outlier_pos.
type : "outliers". type of imputation.
imputate_na
.
# Replace the outliers of the sodium variable with median.
imputate_outlier(heartfailure, sodium, method = "median")
# Replace the outliers of the sodium variable with capping.
imputate_outlier(heartfailure, sodium, method = "capping")
imputate_outlier(heartfailure, sodium, method = "capping",
cap_ntiles = c(0.1, 0.9))
## using dplyr -------------------------------------
library(dplyr)
# The mean before and after the imputation of the sodium variable
heartfailure %>%
mutate(sodium_imp = imputate_outlier(heartfailure, sodium,
method = "capping", no_attrs = TRUE)) %>%
group_by(death_event) %>%
summarise(orig = mean(sodium, na.rm = TRUE),
imputation = mean(sodium_imp, na.rm = TRUE))
# If the variable of interest is a numerical variables
sodium <- imputate_outlier(heartfailure, sodium)
sodium
summary(sodium)
plot(sodium)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.