knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The IE Office at Utah Tech has several tools streamlining the process for producing headcount disaggregated by groups. The tools from {utDataStoR}
combined with tools from {utHelpR
} make it easy to produce and publish headcount tables and graphs in minutes. You can use {utDataStoR}
to produce the sql query for the populations you would like to analyse, and then use {utHelpR}
to shape the data into table in just a few lines of code. You can then use markdown to add narrative to your tables. Once you are finished, your work can be published with a single button.
{utHelpR}
has several functions to minimize the amount of code needed to produce reports. In this vignette, we describe how to quickly produce reports involving headcount. We make use of three functions to quickly produce pivot tables with counts
utHelpR::uth_make_outcome_count
to compute headcount.heHelpR::uth_snake_to_title
to convert column title from data frames into pretty column names for tables and graphs.purrr::map
to quickly compute aggregate headcounts disaggregated over lists of attributes.We use the following libraries for our recipe.
library(utHelpR) library(dplyr, warn.conflicts = FALSE) library(purrr) library(gt) library(stringr)
uth_snake_to_title <- function(text) { stringr::str_replace_all(text, "_", " ") %>% stringr::str_to_title() }
We start by creating a data frame with student information from the data frame fake_enrollment
. fake_enrollment
is a fake training data set included in the utHelpR
library. This data frame has NA
values in several columns. This will give us an opportunity to practice cleaning data and selecting the student attributes we want to aggregate over.
test_df <- fake_enrollment %>% select(student_id, gender_code, student_type_code, is_international) %>% filter(!is.na(gender_code)) %>% filter(!is.na(student_type_code)) %>% filter(!is.na(is_international))
Suppose we just want to get a headcount of the whole student body aggregated by gender code.
headcount_by_gender <- test_df %>% uth_make_outcome_count(gender_code)
This produces a single data frame.
gt(headcount_by_gender)
This table looks good, but the column titles are written in snake case. We would prefer they were written in title case. To accomplish this, we use the function uth_snake_to_title
. This function takes text written in snake_case
and converts it to text written in Title Case
.
headcount_by_gender %>% rename_with(uth_snake_to_title, everything()) %>% gt()
This table looks better, but we would like the column titled count to be called headcount, and we would like the gender codes to be centered. We are planning to produce tables like this for several groups of aggregates, so we would like to use a function to do this styling. We will specify the column to center to be the first column.
headcount_by_gender %>% rename(headcount = count) %>% rename_with(uth_snake_to_title, everything()) %>% gt() %>% cols_align( align = "center", columns = 1 )
The function we can use to style our tables will be
style_table <- function(table) { table %>% rename(headcount = count) %>% rename_with(uth_snake_to_title, everything()) %>% gt() %>% cols_align( align = "center", columns = 1 ) }
So, our recipe for building a table becomes
headcount_by_gender %>% style_table()
What if we want to produce the headcount of all female students aggregated by student type code?
female_headcount_by_student_type_code <- test_df %>% filter(gender_code == 'F') %>% select(student_id, student_type_code) %>% uth_make_outcome_count(student_type_code) female_headcount_by_student_type_code %>% style_table()
Next suppose we want to produce the headcount for all students, but now we want to aggregate by gender and student type code.
total_headcount_by_student_type_and_gender <- test_df %>% select(student_id, gender_code, student_type_code) %>% uth_make_outcome_count(gender_code, student_type_code) total_headcount_by_student_type_and_gender %>% style_table()
Now suppose that we want to compute the headcount for the total student population, but we want to aggregate this over several different combinations of our three columns. Here we will use the {purrr}
library. Specifically we are going to use the functional purrr::map()
. You can find a good introduction to functionals in Chapter 9 of the text Advanced R by Hadley Wickham.
We will pass our function uth_make_outcome_count()
to purrr::map()
in two different, but equivalent ways. The first method is longer and requires us to define a function to pass to purrr::map()
. The second method uses
aggregate_cols <- list( 'gender_code', 'student_type_code', 'is_international', c('gender_code', 'student_type_code'), c('gender_code', 'is_international'), c('student_type_code', 'is_international'), c('gender_code', 'student_type_code', 'is_international') ) special_purrr_function <- function(...) { output_df <- uth_make_outcome_count(test_df, ...) return(output_df) } holder_list <- purrr::map(aggregate_cols, special_purrr_function)
We can skip the function definition by using the ~
operator.
holder_list <- purrr::map(aggregate_cols, ~uth_make_outcome_count(test_df, .))
holder_list
is a list of data frames containing all the data frames you wanted to produce. You can use the following pattern to pull the data frames out of holder_list
.
holder_list[[6]] %>% style_table()
If you want to produce a spreadsheet for each of the data frames in holder_list
use the following pattern.
uth_make_workbook <-function(raw_data, file_name, folder = "sensitive/") { wb <- createWorkbook() for (i in seq_along(raw_data)) { addWorksheet(wb, sheetName = paste(names(raw_data)[[i]])) writeData(wb, sheet = i, x = raw_data[[i]]) } list file_place <- paste0(folder, file_name, ".xlsx") saveWorkbook(wb, file = here::here(file_place), overwrite = TRUE) }
uth_make_workbook(holder_list, 'headcount_metrics')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.