View source: R/evaluate_imputed_values.R
evaluate_imputed_values | R Documentation |
Compare imputed to true values
evaluate_imputed_values( ds_imp, ds_orig, criterion = "RMSE", M = NULL, cols_which = seq_len(ncol(ds_imp)), tolerance = sqrt(.Machine$double.eps), imp_ds, orig_ds, which_cols )
ds_imp |
A data frame or matrix with imputed values. |
ds_orig |
A data frame or matrix with original (true) values. |
criterion |
A string specifying the used criterion for comparing the imputed and original values. |
M |
NULL (the default) or a missing data indicator matrix. The missing
data indicator matrix is normally created via |
cols_which |
Indices or names of columns used for evaluation. |
tolerance |
Numeric, only used for |
imp_ds |
Deprecated, renamed to |
orig_ds |
Deprecated, renamed to |
which_cols |
Deprecated, renamed to |
The following criterion
s are implemented to compare the
imputed values to the true values:
"RMSE" (the default): The Root Mean Squared Error between the imputed and true values
"bias": The mean difference between the imputed and the true values
"cor": The correlation between the imputed and true values
"MAE": The Mean Absolute Error between the imputed and true values
"MSE": The Mean Squared Error between the imputed and true values
"NRMSE_col_mean": For every column the RMSE divided by the mean of the true values is calculated. Then these columnwise values are squared and averaged. Finally, the square root of this average is returned.
"NRMSE_col_mean_sq": For every column the RMSE divided by the square root of the mean of the squared true values is calculated. Then these columnwise values are squared and averaged. Finally, the square root of this average is returned.
"NRMSE_col_sd": For every column the RMSE divided by the standard deviation of all true values is calculated. Then these columnwise values are squared and averaged. Finally, the square root of this average is returned.
"NRMSE_tot_mean": RMSE divided by the mean of all true values
"NRMSE_tot_mean_sq": RMSE divided by the square root of the mean of all squared true values
"NRMSE_tot_sd": RMSE divided by the standard deviation of all true values
"nr_equal": number of imputed values that are equal to the true values
"nr_NA": number of values in ds_imp
that are NA (not imputed)
"precision": proportion of imputed values that are equal to the true values
Additionally there are relative versions of bias and MAE implemented. In the relative versions, the differences are divided by the absolute values of the true values. These relative versions can be selected via "bias_rel" and "MAE_rel". The "NRMSE_tot_" and "NRMSE_col_" are equal, if the columnwise normalization values are equal to the total normalization value (see examples).
The argument cols_which
allows the selection of columns
for comparison (see examples).
If M = NULL
(the default), then all values of ds_imp
and
ds_orig
will be used for the calculation of the evaluation criterion.
If a missing data indicator matrix is given via M
, only the truly
imputed values (values that are marked as missing via M
) will be used
for the calculation. If you want to provide M
, M
must be a
logical matrix of the same dimensions as ds_orig
and missing values
must be coded as TRUE. This is the standard behavior, if you use
is.na
on a dataset with missing values to generate
M
(see examples). It is possible to combine M
and
cols_which
.
A numeric vector of length one.
Kim, H., Golub, G. H., & Park, H. (2005). Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics, 21(2), 187-198.
Other evaluation functions:
evaluate_imputation_parameters()
,
evaluate_parameters()
ds_orig <- data.frame(X = 1:10, Y = 101:110) ds_mis <- delete_MCAR(ds_orig, 0.3) ds_imp <- impute_mean(ds_mis) # compare all values from ds_orig and ds_imp evaluate_imputed_values(ds_imp, ds_orig) # compare only the imputed values M <- is.na(ds_mis) evaluate_imputed_values(ds_imp, ds_orig, M = M) # compare only the imputed values in column X evaluate_imputed_values(ds_imp, ds_orig, M = M, cols_which = "X") # NRMSE_tot_mean and NRMSE_col_mean are equal, if columnwise means are equal ds_orig <- data.frame(X = 1:10, Y = 10:1) ds_mis <- delete_MCAR(ds_orig, 0.3) ds_imp <- impute_mean(ds_mis) evaluate_imputed_values(ds_imp, ds_orig, "NRMSE_tot_mean") evaluate_imputed_values(ds_imp, ds_orig, "NRMSE_col_mean")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.