expl_cond_dist_tbl: expl_cond_dist_tbl
In gloverd2/admr: Contains standard functions for Admiral Advance Analytics

Description Usage Arguments Value Functions required Examples

This function will return a table of the cumulative conditional distribution for banded variables. The definition for this is (# of entries in both the band of main_var and the band of cond_var)/#(number of entries in the band of cond_var).

This tells us "Given that the entry is in the band of cond_var, what is the probability they are in this band of main_var?" This can be useful for things like assessing model refreshes. This can answer questions such as: "Given that the score on the old model is in the band for 500-750, what is the percentage of these scores end up in each band for the new model?" This could be done with expl_band_cond_shift(main_var=df$new_score, cond_var=df$old_score). You would look for the column with your 500-750 band (although normally notated with just "750").

expl_cond_dist_tbl(
  main_var,
  cond_var,
  output_var = "thin",
  NA_val = "_NA",
  warn_high_band = 50L,
  err_high_band = 100L,
  verbose = TRUE
)

`main_var`	Array[Character]: This is a banded version of the variable which you would like to assess dependent on another variable. If doing a model refresh, this would be the new score.
`cond_var`	Array[Character]: This is a banded version of the variable which we are conditioning on. If doing a model refresh, this would be the old score.
`output_var`	Character (Default: "thin"): This is a choice of which form of output should be given. Options are: ("thin", "prop", "count"). See "Value".
`NA_val`	Character/Numeric/NA (Default: "_NA"): NA replacement value.
`warn_high_band`	Numeric (Default: 50): This is a variable which will be used to set how many bands are needed to generate a warning. If you do not want this, then just set it above err_high_band.
`err_high_band`	Numeric (Default: 100): This is a variable which will cause an error if the number of bands is exceeded
`verbose`	Logical (Default: TRUE): This is a variable which is used to determine if we want to print a wide version of the table.

DataFrame: The format of this dataframe is dependent on output_var. If output_var = "thin", then the output will be a table with each row being a unique combination of main_var and cond_var. If output_var = "count", then the output will be the wide table with the amount of entries that are in each row,column combination. If output_var = "prop", then the output will be the wide table with the proportion as a percentage of entries in that row,column combination of the column.

prep_char_num_sort

expl_cond_dist_tbl(main_var = c(1,1,1,2,1), cond_var = c(1,1,1,1,2), output_var="prop")
Output: Table with Col 1: (3/4 x 100 = 75, 1/4 x 100 = 25); Col 2: (1/1 x 100 = 100, 0/1 x 100 = 0)
i.e There are 4 positions with cond_var = 1. 3/4 of these have 1, 1/4 of these have 2 in main_var.
There is only 1 value with cond_var = 2. In main_var this is a 1.
-> 1 out of 1 for col "2" in row main_var = "1", 0 out of 1 for col "2" in row main_var ="2".