di_iterate_on_long: Iteratively calculate disproportionate impact using multiple...

View source: R/di_iterate_on_long.R

di_iterate_on_longR Documentation

Iteratively calculate disproportionate impact using multiple methods for a long and summarized data set

Description

Calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for a "long" and summarized data set with many success variables and disaggregation variables, where the success counts and disaggregation groups are stored in a single column or variable for each.

Usage

di_iterate_on_long(
  data,
  num_var,
  denom_var,
  disagg_var_col,
  group_var_col,
  disagg_var_col_2 = NULL,
  group_var_col_2 = NULL,
  cohort_var_col = NULL,
  summarize_by_vars = NULL,
  custom_reference_group_flag_var = NULL,
  ...
)

Arguments

data

A data frame for which to iterate DI calculations for a set of variables.

num_var

A variable name (character value) from data where the variable stores success counts (the numerator in success rates). Success rates are calculated by aggregating num_var and denom_var for each unique combination of values in disagg_var_col, group_var_col, disagg_var_col_2, group_var_col_2, cohort_var_col, and summarize_by_vars. If such combinations are unique (single row), then rows are not collapsed.

denom_var

A variable name (character value) from data where the variable stores the group size (the denominator in success rates).

disagg_var_col

A variable name (character value) from data where the variable stores the different disaggregation scenarios. The disaggregation variable could include such values as 'Ethnicity', 'Age Group', and 'Foster Youth', corresponding to three disaggregation scenarios.

group_var_col

A variable name (character value) from data where the variable stores the group name for each group within a level of disaggregation specified in disagg_var_col. For example, the group names could include 'Asian', 'White', 'Black', 'Latinx', 'Native American', and 'Other' for a disaggregation on ethnicity; 'Under 18', '18-21', '22-25', and '25+' for an age group disaggregation; and 'Yes' and 'No' for a foster youth status disaggregation.

disagg_var_col_2

(Optional) A variable name (character value) from data where the variable stores an optional second disaggregation variable, which allows for the intersectionality of variables listed in disagg_var_col and disagg_var_col_2. The second disaggregation variable could describe something not in disagg_var_col_2, such as 'Gender', which would require all groups described in group_var_col to be broken out by gender.

group_var_col_2

(Optional) A variable name (character value) from data where the variable stores the group name for each group within a second level of disaggregation specified in disagg_var_col_2. For example, the group names could include 'Male', 'Female', 'Non-binary', and 'Unknown' if 'Gender' is a value in the variable disagg_var_col_2.

cohort_var_col

(Optional) A variable name (character value) from data where the variable stores the cohort label for the data described in each row.

summarize_by_vars

(Optional) A character vector of variable names in data for which num_var and denom_var are used for aggregation to calculate success rates for the dispropotionate impact (DI) analysis set up by disagg_var_col, group_var_col, disagg_var_col_2, and group_var_col_2. For example, summarize_by_vars=c('Outcome') could specify a single variable/column that describes the outcome or metric in num_var, where the outcome values might include 'Completion of Transfer-Level Math', 'Completion of Transfer-Level English','Transfer', 'Associate Degree'.

custom_reference_group_flag_var

(Optional) A variable name (character value) from data where the variable flags the row or group that should be used as the reference group (1 if row is a reference group, 0 otherwise) for comparison in the percentage point gap method and the 80% index method. When this argument is used, then the ppg_reference_groups and di_80_index_reference_groups arguments should not be specified.

...

(Optional) Other arguments such as ppg_reference_groups, min_moe, use_prop_in_moe, prop_sub_0, prop_sub_1, di_prop_index_cutoff, di_80_index_cutoff, di_80_index_reference_groups, and check_valid_reference from di_iterate.

Details

Iteratively calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for all combinations of success_vars, group_vars, and cohort_vars, for each combination of subgroups specified by scenario_repeat_by_vars.

Value

A summarized data set (data frame) consisting of:

  • variables specified by summarize_by_vars, disagg_var_col, group_var_col, disagg_var_col_2, and group_var_col_2,

  • di_indicator_ppg (1 if there is disproportionate impact per the percentage point gap method, 0 otherwise),

  • di_indicator_prop_index (1 if there is disproportionate impact per the proportionality index, 0 otherwise),

  • di_indicator_80_index (1 if there is disproportionate impact per the 80% index, 0 otherwise), and

  • other relevant fields returned from di_ppg, di_prop_index, and di_80_index.

Examples

library(dplyr)
data(ssm_cohort)
di_iterate_on_long(data=ssm_cohort %>% filter(missingFlag==0) # remove missing data
  , num_var='value', denom_var='denom'
  , disagg_var_col='disagg1', group_var_col='subgroup1'
  , cohort_var_col='academicYear', summarize_by_vars=c('categoryLabel')
  , ppg_reference_groups='all but current' # PPG-1
  , di_80_index_reference_groups='all but current')

DisImpact documentation built on Oct. 11, 2022, 1:06 a.m.