View source: R/performance_nan_imputation.R
performance_nan_imputation | R Documentation |
This function evaluates the performance of various missing value imputation methods in a quantitative dataframe. It is designed to examine and compare five different imputation methods using standard performance measures
performance_nan_imputation(data, to_impute, regressors, method = 1)
data |
A dataframe containing the observations (rows) and quantitative variables (columns) to be analyzed. This dataframe includes variables with missing values to be imputed |
to_impute |
A string specifying the name of the variable in the dataframe that contains the missing values to be imputed |
regressors |
A vector of strings indicating the names of the variables to be used as regressors for imputation in the case of methods 1 (lm_imputation) and 4 (hot deck imputation) |
method |
An integer between 1 and 5 that specifies the imputation method to be used. The supported methods are: 1: lm_imputation (Imputation by linear model) 2: median imputation (imputation by median) 3: mean imputation (imputation by mean) 4: hot deck imputation (imputation via hot deck) 5: EM imputation (imputation via Expectation-Maximization) |
This function is useful for comparing the effectiveness of different methods of imputing missing values, allowing the most appropriate method to be chosen based on measured performance
The function returns a dataframe that contains a row for each imputation method and columns with performance measures. The performance measures included are:
R^2: Coefficient of Determination, which measures how well the imputed values fit the observed values
RMSE: Root Mean Squared Error, which provides a measure of the mean square deviation between imputed and observed values
MAE: Mean Absolute Error, which represents the mean absolute deviation between the imputed and observed values
OECD/European Union/EC-JRC (2008), Handbook on Constructing Composite Indicators: Methodology and User Guide, OECD Publishing, Paris, <https://doi.org/10.1787/9789264043466-en>
data("airquality")
regressors<-colnames(airquality[,c(3,4)])
#---Methods 1 = Imputation by linear model
performance_nan_imputation(data =airquality,"Ozone",regressors = regressors,method = 1)
#---Methods 2 = Imputation by Median
suppressWarnings(performance_nan_imputation(data =airquality,"Ozone",method = 2))
#---Methods 3 = Imputation by Mean
suppressWarnings(performance_nan_imputation(data =airquality,"Ozone",method = 3))
#---Methods 4 = Hot Deck imputation
performance_nan_imputation(data =airquality,"Ozone",regressors = regressors,method = 4)
#---Methods 5 = Expectation-Maximization imputation
performance_nan_imputation(data =airquality,"Ozone",regressors = regressors,method = 5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.