evaluate_imputed_values: Evaluate imputed values
In missMethods: Methods for Missing Data

View source: R/evaluate_imputed_values.R

evaluate_imputed_values

R Documentation

Evaluate imputed values

Description

Compare imputed to true values

Usage

evaluate_imputed_values(
  ds_imp,
  ds_orig,
  criterion = "RMSE",
  M = NULL,
  cols_which = seq_len(ncol(ds_imp)),
  tolerance = sqrt(.Machine$double.eps),
  imp_ds,
  orig_ds,
  which_cols
)

Arguments

`ds_imp`	A data frame or matrix with imputed values.
`ds_orig`	A data frame or matrix with original (true) values.
`criterion`	A string specifying the used criterion for comparing the imputed and original values.
`M`	NULL (the default) or a missing data indicator matrix. The missing data indicator matrix is normally created via `is.na(ds_mis)`, where `ds_mis` is the dataset after deleting values from `ds_orig`.
`cols_which`	Indices or names of columns used for evaluation.
`tolerance`	Numeric, only used for `criterion = "precision"`: numeric differences smaller than tolerance are treated as zero/equal.
`imp_ds`	Deprecated, renamed to `ds_imp`.
`orig_ds`	Deprecated, renamed to `ds_orig`.
`which_cols`	Deprecated, renamed to `cols_which`.

Details

The following criterions are implemented to compare the imputed values to the true values:

"RMSE" (the default): The Root Mean Squared Error between the imputed and true values
"bias": The mean difference between the imputed and the true values
"cor": The correlation between the imputed and true values
"MAE": The Mean Absolute Error between the imputed and true values
"MSE": The Mean Squared Error between the imputed and true values
"NRMSE_col_mean": For every column the RMSE divided by the mean of the true values is calculated. Then these columnwise values are squared and averaged. Finally, the square root of this average is returned.
"NRMSE_col_mean_sq": For every column the RMSE divided by the square root of the mean of the squared true values is calculated. Then these columnwise values are squared and averaged. Finally, the square root of this average is returned.
"NRMSE_col_sd": For every column the RMSE divided by the standard deviation of all true values is calculated. Then these columnwise values are squared and averaged. Finally, the square root of this average is returned.
"NRMSE_tot_mean": RMSE divided by the mean of all true values
"NRMSE_tot_mean_sq": RMSE divided by the square root of the mean of all squared true values
"NRMSE_tot_sd": RMSE divided by the standard deviation of all true values
"nr_equal": number of imputed values that are equal to the true values
"nr_NA": number of values in ds_imp that are NA (not imputed)
"precision": proportion of imputed values that are equal to the true values

Additionally there are relative versions of bias and MAE implemented. In the relative versions, the differences are divided by the absolute values of the true values. These relative versions can be selected via "bias_rel" and "MAE_rel". The "NRMSE_tot_" and "NRMSE_col_" are equal, if the columnwise normalization values are equal to the total normalization value (see examples).

The argument cols_which allows the selection of columns for comparison (see examples).

If M = NULL (the default), then all values of ds_imp and ds_orig will be used for the calculation of the evaluation criterion. If a missing data indicator matrix is given via M, only the truly imputed values (values that are marked as missing via M) will be used for the calculation. If you want to provide M, M must be a logical matrix of the same dimensions as ds_orig and missing values must be coded as TRUE. This is the standard behavior, if you use is.na on a dataset with missing values to generate M (see examples). It is possible to combine M and cols_which.

Value

A numeric vector of length one.

References

Kim, H., Golub, G. H., & Park, H. (2005). Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics, 21(2), 187-198.

Examples

ds_orig <- data.frame(X = 1:10, Y = 101:110)
ds_mis <- delete_MCAR(ds_orig, 0.3)
ds_imp <- impute_mean(ds_mis)
# compare all values from ds_orig and ds_imp
evaluate_imputed_values(ds_imp, ds_orig)
# compare only the imputed values
M <- is.na(ds_mis)
evaluate_imputed_values(ds_imp, ds_orig, M = M)
# compare only the imputed values in column X
evaluate_imputed_values(ds_imp, ds_orig, M = M, cols_which = "X")

# NRMSE_tot_mean and NRMSE_col_mean are equal, if columnwise means are equal
ds_orig <- data.frame(X = 1:10, Y = 10:1)
ds_mis <- delete_MCAR(ds_orig, 0.3)
ds_imp <- impute_mean(ds_mis)
evaluate_imputed_values(ds_imp, ds_orig, "NRMSE_tot_mean")
evaluate_imputed_values(ds_imp, ds_orig, "NRMSE_col_mean")

missMethods documentation built on Sept. 16, 2022, 5:08 p.m.