expl_cond_dist_tbl: expl_cond_dist_tbl

Description Usage Arguments Value Functions required Examples

View source: R/expl_cond_dist_tbl.R

Description

This function will return a table of the cumulative conditional distribution for banded variables. The definition for this is (# of entries in both the band of main_var and the band of cond_var)/#(number of entries in the band of cond_var).

This tells us "Given that the entry is in the band of cond_var, what is the probability they are in this band of main_var?" This can be useful for things like assessing model refreshes. This can answer questions such as: "Given that the score on the old model is in the band for 500-750, what is the percentage of these scores end up in each band for the new model?" This could be done with expl_band_cond_shift(main_var=df$new_score, cond_var=df$old_score). You would look for the column with your 500-750 band (although normally notated with just "750").

Usage

1
2
3
4
5
6
7
8
9
expl_cond_dist_tbl(
  main_var,
  cond_var,
  output_var = "thin",
  NA_val = "_NA",
  warn_high_band = 50L,
  err_high_band = 100L,
  verbose = TRUE
)

Arguments

main_var

Array[Character]: This is a banded version of the variable which you would like to assess dependent on another variable. If doing a model refresh, this would be the new score.

cond_var

Array[Character]: This is a banded version of the variable which we are conditioning on. If doing a model refresh, this would be the old score.

output_var

Character (Default: "thin"): This is a choice of which form of output should be given. Options are: ("thin", "prop", "count"). See "Value".

NA_val

Character/Numeric/NA (Default: "_NA"): NA replacement value.

warn_high_band

Numeric (Default: 50): This is a variable which will be used to set how many bands are needed to generate a warning. If you do not want this, then just set it above err_high_band.

err_high_band

Numeric (Default: 100): This is a variable which will cause an error if the number of bands is exceeded

verbose

Logical (Default: TRUE): This is a variable which is used to determine if we want to print a wide version of the table.

Value

DataFrame: The format of this dataframe is dependent on output_var. If output_var = "thin", then the output will be a table with each row being a unique combination of main_var and cond_var. If output_var = "count", then the output will be the wide table with the amount of entries that are in each row,column combination. If output_var = "prop", then the output will be the wide table with the proportion as a percentage of entries in that row,column combination of the column.

Functions required

prep_char_num_sort

Examples

1
2
3
4
5
expl_cond_dist_tbl(main_var = c(1,1,1,2,1), cond_var = c(1,1,1,1,2), output_var="prop")
Output: Table with Col 1: (3/4 x 100 = 75, 1/4 x 100 = 25); Col 2: (1/1 x 100 = 100, 0/1 x 100 = 0)
i.e There are 4 positions with cond_var = 1. 3/4 of these have 1, 1/4 of these have 2 in main_var.
There is only 1 value with cond_var = 2. In main_var this is a 1.
-> 1 out of 1 for col "2" in row main_var = "1", 0 out of 1 for col "2" in row main_var ="2".

gloverd2/admr documentation built on Dec. 2, 2020, 11:16 p.m.