scrimp_vars: Score imputations for specific variables

Description Usage Arguments Value Note Examples

View source: R/scrimp.R

Description

If you want to evaluate how accurately an imputation procedure fills in missing values, scrimp_vars can help. Generally, scrimp_vars only applies to artificial situations where you ampute your data (i.e., make missing values), then impute it. For a more general imputation validation procedure, see scrimp_mdl.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
scrimp_vars(
  data_imputed,
  data_missing,
  data_complete,
  miss_indx = NULL,
  fun_ctns_error = yardstick::rsq_trad_vec,
  fun_intg_error = yardstick::rsq_trad_vec,
  fun_bnry_error = yardstick::kap_vec,
  fun_catg_error = yardstick::kap_vec
)

Arguments

data_imputed

an imputed data frame.

data_missing

the unimputed data frame.

data_complete

a data frame containing the 'true' values that were 'missing'.

miss_indx

an object returned from the mindx function applied to data_missing.

fun_ctns_error

a function that will evaluate errors for continuous variables. Continuous variables have type double. Default is to use R-squared (see yardstick::rsq()).

fun_intg_error

a function that will evaluate errors for integer valued variables. Default is to use R-squared (see yardstick::rsq()).

fun_bnry_error

a function that will evaluate errors for binary variables (i.e., factors with 2 levels). Default is to use kappa agreement (see yardstick::kap()).

fun_catg_error

a function that will evaluate errors for categorical variables (i.e., factors with >2 levels). Default is to use kappa agreement (see yardstick::kap()).

Value

a tibble::tibble() with columns variable, type, and score. The score column comprises output from the error functions.

Note

Kappa agreement is a similar to measuring classification accuracy, but is normalized by the accuracy that would be expected by chance alone and is very useful when one or more classes have large frequency distributions.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
df_complete <- data.frame(a = 1:10, b = 1:10, c = 1:10, d=1:10,
  fctr = letters[c(1,1,1,1,1,2,2,2,2,2)])

df_miss = df_complete
df_miss[1:3, 1] <- NA
df_miss[2:4, 2] <- NA
df_miss[3:5, 3] <- NA
df_miss[4:6, 5] <- NA


imputes <- list(a=1:3, b=2:4, c=3:5, fctr = factor(c('a','a','b')))

df_imputed <- fill_na(df_miss, vals = imputes)

scored <- scrimp_vars(df_imputed, df_miss, df_complete)

bcjaeger/ipa documentation built on May 7, 2020, 9:45 a.m.