Diversity and Inclusion Report
In div: Report on Diversity and Inclusion in a Corporate Setting

Introduction

This report is for the manager who honestly tries to forge strong and diverse teams, and fosters an inclusive atmosphere. Nobody is free from bias and we are influenced by who we are as well as by by our environment. Our brain is evolved to do pattern recognition, and just as machine learning that will pick up patterns that might be true (or true in our distorted perception of the world) on average, but forego the right of everyone to be treated as an individual. Even with the best intentions, each one of us will have certain biases: both active and passive. Active bias is where one holds an explicit or implicit bias and hence will automatically value people more based on that bias. To get you started, we refer to two possible places where you can test for your own biases:

Secondly, there is the passive bias. Passive bias occurs where (independently of your bias), the typical behaviour of one group is different than the other. To get you started on this subject, we refer to the MBTI profiles or the -- more recent and more scientific -- the ``theory of the big five personality traits''. The theory identifies five factors:

openness to experience (inventive/curious vs. consistent/cautious)
conscientiousness (efficient/organized vs. extravagant/careless)
extraversion (outgoing/energetic vs. solitary/reserved)
agreeableness (friendly/compassionate vs. challenging/callous)
neuroticism (sensitive/nervous vs. resilient/confident)

In both theories men and women are typically[^1] different. For example in the Big Five one finds that women score higher on extroversion, neuroticism, and agreeableness. The combination of those two last dimension implies that men will be (on average) more confident and less likely to accept that there is no salary rise for them. Therefore men will be more likely to ask promotion and salary increase and will therefore also be more likely to obtain it.

[^1]: Typically means here "on average". For example in the MBTI profile, we find that roughly 60% of males are "Thinking", where 60% of females are "Sensing". More information about the MBTI profiles is for example on Wikipedia or myersbriggs.org

This report aims to point out if there are possible areas where bias would have influenced the salaries in your team. We have no claims on where the bias comes from but present the analyses on such way that it becomes actionable.

The Particular Example Behind this Demo Report

library(div)
library(gridExtra)
knitr::opts_chunk$set(echo = FALSE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(message = FALSE)
teamSize       <- 200
pctFemales     <- 0.45
maleSalaryBias <- 1.05
d <- div_fake_team(seed = 1890122, 
                   N = teamSize, 
                   genders = c("F", "M"),
                   gender_prob = c(pctFemales, 1 - pctFemales),
                   gender_salaryBias = c(1, maleSalaryBias)
                   )

The data used for this report is randomly generated with the following characteristcs:

team size: r teamSize
percentage of females: r pctFemales
average male's salary / average female's salary: r maleSalaryBias
we do not use jobID and hence assume that all people in the same Grade do the same work (and hence can be compared)

So, while this data has a bias built in for the Gender Paygap (on average males will earn r paste0(round(maleSalaryBias*100 - 100, 2), "%")) more than females; there is no bias built in for any of the other dimensions that can be studies (e.g. Citizenship)

Diversity

First we present how diverse the team actually is. Diversity is expressed as a number between $0$ and $1$. The diversity is $1$ if the population in the team is exactly the same as the reference population (e.g. the gender diversity equals 1 if the team has 50\% males,^[The country has 50.2\% males in the work population, but that does not seem to be the case in Krakow -- so we use an equal percentage for both genders.] because that would be the same number as the reference population of the country).

Gender diversity per grade

In Figure \ref{fig:gender-gauge} you can see the diversity index for the gender distribution per grade.

```r"} t <- table(d$gender, d$grade) t1 <- apply(t, 2, diversity, prior = c(50.2, 49.8)) t2 <- tibble(value = t1, label = paste("Grade", names(t1))) div_gauge_plot(t2, ncol = 3)

## Age diversity

The diversity in function of age is represented in Figure \ref{fig:ageism}. Here we do not show the distribution per grade, because one can expect a natural correlation between age, experience and grade.

```r"}
# ages in Poland
ageRef <- c(15.94, 27.11, 23.99, 20.68, 12.28) / 100
ad <- table(cut(d$age, breaks = c(20, 30, 40, 50, 60, 65)))
ad <- ad / sum(ad)  # not needed, diversity() does this, bu useful for the table
t <- tibble(value = diversity(ad, prior = ageRef), label = "Age")
div_gauge_plot(t)
t <- rbind(ad, ageRef)
rownames(t) <- c("team", "reference")
knitr::kable(round(t,2), 
             caption = "The percentages of different age groups in this team compared to the population of Poland. An equal distribution would yield a diversity of 1.\\label{tab:div:age}"
             )

To calculate the diversity, we assume that the reference population is distributed as in Table \ref{tab:div:age}. These are the numbers corresponding to Poland.

Diversity in nationalities

While ethnicity is equally valuable in diversity, nationality is measurable, available in most HR databases, and readily quantified. Few people can trace their heritage uniquely to a particular ethinic ascendance and DNA studies reveal that the concept of race is a quite blurred reality -- it is rather a clustering of people that actually display a continuum instead binary racial characteristics.^[A good introduction is here on the website of Harvard University]

#t <- table(d$citizenship, d$grade)
t <- d                     %>% 
     group_by(citizenship) %>% 
     summarise(n = n())    %>%
     arrange(desc(n))      %>%
     ggplot( aes(x = citizenship, y = n))    +
        geom_col(aes(fill = n), width = 0.7) +
        theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5),
              axis.title.x=element_blank())

t

t <- d                            %>% 
     group_by(citizenship, grade) %>% 
     summarise(n = n())           %>%
     ggplot( aes(x = grade, y = n))                +
        geom_col(aes(fill = n), width = 0.7)       +
        facet_wrap(~ citizenship, scales="free_y") +
        theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5),
                axis.title.x=element_blank())

t

\afterpage{\clearpage}

Inclusion

In previous section, we have shed some light on how diverse the team is. In this section we want to examine how inclusive the team is. We will do this via the salary and unconscious bias. Everyone has certain biases and we can investigate if certain genders or nationalities earn less in similar roles.

We can however not prove that this is the result of bias. We show only the most important control variable: experience (or rather its proxy "age"). There might be many other factors such as starting date in the firm or team, performance, productive behaviour etc. These variables are not part of this exercise, but they can be used by the manager to understand if an observed paygap is equitable or not.

Method

We realise that publishing a ``paygap'' as difference between the average salary of the male and female is not very helpful as this is most likely not measuring a paygap but rather a difference in occupations.

Therefore we only present results of groups that are comparable.

Legend

paygap = the ratio of median salaries of one group divided by the median of the salaries of the other group
NA = numbers are too small, please look at individuals;
nothing = no bias detectable;
. = maybe there is some bias, but the numbers are low, check individuals;
* = you should check for bias;
** = bias is probably there;
*** = most certainly there is bias

So, there will be more stars if the probability of a bias is higher: this can be due to a higher bias and/or due to a larger sample size.

To improve readability, the column headers are abbreviated. We summarise here the columns and their content:

grade = the salary grade as used in the company
jobID = a unique identifier of the job category (can be abbreviated)
sal_F = the median salary of the females (F), this will be different for other paygap estimations
sal_oth = the median salary of the other groups (non F). The tool is open to use more than one gender. A paygap is expressed as the median of one group compared to others.
age_F = the median age of the females (or age_Pol could be the median age of the team members with Polish citizenship)
age_oth = the median age of the other groups take together (e.g. the median age of non females)
paygap = the ratio of median salary earned by the selected group (e.g. females) divided by the median of the other people. So, if this number is lower than one, then median female earns less than the median non-female.
conf. = the confidence level that this paygap is significant and that bias is part of the explanation (for details see in the beginning of this section).

The Pay-Ratios (``Pay-Gap'')

The "paygap" as popularly used is actually a ratio and not a gap. If the paygap for example is 0.8 in the the gender analysis and the reference group is females, then it can be understood as follows: "a woman earns 0.8 for every dollar that a man (not woman) earns."

The Gender PayGap

Table \ref{tab:pg:gender} shows the results for the Gender paygap.

pg_gender <- d                 %>%
             div_paygap       
tmp <- div_parse_paygap(pg_gender, label = "tab:pg:gender")
knitr::kable(tmp$d,
             caption = tmp$caption,
             format.args = tmp$format.args,
             label = tmp$label)

The Citizenship PayGap

Table \ref{tab:pg:ctzn} shows the results for the paygap for the main nationality compared to all other nationalities together.

Note that the data did not include a bias in function of citizenship. So any bias that appears below is due to random sampling that used to created the data. That gives you an idea about the balance between being sensitive enough to pick up a bias and not to brand random variations as biased.

pg_citizenship <- d                %>%
             div_paygap(colname = "citizenship" , selectedValue = "Polish")        %>%
             div_round_paygap

tmp <- div_parse_paygap(pg_citizenship, colname = "citizenship", label = 'tab:pg:ctzn')
knitr::kable(tmp$d,
             caption = tmp$caption,
             format.args = tmp$format.args)

Conclusions -- How to Use This Report

This report is a statistical tool to that merely points in a direction where a manager can start looking for potential bias. It cannot prove that bias exits, in all cases it is necessary to understand what is happening and why.

The reason why we cannot prove the existence of bias is that: even where we have rather large numbers and can attribute three stars, the individual performance might be so that the apparent bias is justified.

The best way to us this report is hence to use it a lead that needs investigation. The goal must not be to make sure all stars disappear, but rather that we understand and accept why certain stars are there.

Appendix 1: The Diversity Index

We express diversity as a number between zero and one. Our calculation is based on @debrouwer2020 and more in particular section 36.3.1 ``The Business Case: a Diversity Dashboard''. Details can be found in the book^[In essence, the diversity index $di$ is defined as $di = -\sum{p_i\log{p_i}}$, where $p_i$ are the proportions of the population in class $i4.], but in summary the method has the following key aspects:

The diversity is 0 if only one of the groups is present, and is 1 if both groups are equitably present^[What a fair and impartial distribution is must of course be carefully considered and is up the user of this report. As initial position, we used the work-population of the country Poland.]
This calculation is similar to the established concept of entropy in physics.
More than two categories can be used (e.g. one is not limited to two genders)
It is possible to calibrate the probabilities so that they show maximum entropy (or diversity) for the percentages that naturally occur. Imagine for example that we are after a war and the country has 60\% females. That means that we want to see a maximum diversity with a F/M ratio of 60/40. This is done in this report.

```r"} x <- seq(length.out = 500, from = 0, to = 1) t <- cbind(x, 1-x) tt <- tibble(diversity = apply(t, 1, FUN = diversity), x = x) p1 <- ggplot(data=tt, aes(x=x, y=diversity, color = diversity)) + geom_line(lwd = 3) + ggtitle("Diversity index for binary variables and prior probabilities of 50/50") + xlab("Proportion of one of the two classes")

tt <- tibble(diversity = apply(t, 1, FUN = diversity, prior = c(.7,.3)), x = x) p2 <- ggplot(data=tt, aes(x=x, y=diversity, color = diversity)) + geom_line(lwd = 3) + ggtitle("Diversity index for binary variables and prior probabilities of 70/30") + xlab("Proportion of one of the two classes") grid.arrange(p1, p2, ncol = 1)

In Figure \ref{fig:diversityIllustrated} we show how these properties of the diversity index evolve in function of the proportions of different classes. This example is for binary classes, because that is easier to visualise. When more classes are present, then the visualisations are to be done in a space with more dimensions.

# Appendix 2: Further Analysis

What really drives your salaries? Below is a decision tree that (is over-fitted) shows the most determining variables to come to the salary (average numbers in the boxes at the bottom).

```r
range01 <- function(x){(x-min(x))/(max(x)-min(x))}
df <- data.frame(
            isFemale     = ifelse(d$gender == "F", 1, 0),
            grade        = d$grade,
            salary       = range01(d$salary / d$fte),
            jobID        = factor(d$jobID),
            age          = range01(d$age),
            degree       = factor(d$degree),
#            tenure_firm  = range01(d$tenure_firm),
#            tenure_grade = range01(d$tenure_grade),
            tenure_job   = range01(d$tenure_job),
            team         = factor(paste0(d$team, "_", d$subteam)),
            performance  = factor(d$performance)
            )
library(rpart)
library(rpart.plot)
t0 <- rpart(salary ~ ., df, method = "anova",
            control = rpart.control(
              cp = 0.001
            )
)
prp(t0, type = 5, extra = 0, box.palette = "auto", yesno = 1)

Bibliography

Any scripts or data that you put into this service are public.

div documentation built on May 6, 2021, 9:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

div
Report on Diversity and Inclusion in a Corporate Setting

Diversity and Inclusion Report
In div: Report on Diversity and Inclusion in a Corporate Setting

Introduction

The Particular Example Behind this Demo Report

Diversity

Gender diversity per grade

Diversity in nationalities

Inclusion

Method

Legend

The Pay-Ratios (``Pay-Gap'')

The Gender PayGap

The Citizenship PayGap

Conclusions -- How to Use This Report

Appendix 1: The Diversity Index

Bibliography

Try the div package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

div Report on Diversity and Inclusion in a Corporate Setting

Diversity and Inclusion Report In div: Report on Diversity and Inclusion in a Corporate Setting

Introduction

The Particular Example Behind this Demo Report

Diversity

Gender diversity per grade

Diversity in nationalities

Inclusion

Method

Legend

The Pay-Ratios (``Pay-Gap'')

The Gender PayGap

The Citizenship PayGap

Conclusions -- How to Use This Report

Appendix 1: The Diversity Index

Bibliography

Try the div package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

div
Report on Diversity and Inclusion in a Corporate Setting

Diversity and Inclusion Report
In div: Report on Diversity and Inclusion in a Corporate Setting