#| include: false knitr::opts_chunk$set(fig.path = "../man/figures/art-110-") library(ggplot2)
Stickiness is a more-inclusive alternative to graduation rate as a measure of a program's success in attracting, keeping, and graduating their undergraduates. All students excluded by a conventional graduation rate metric--including migrators---are included in the stickiness metric [@Ohland+Orr+others:2012].
This vignette in the MIDFIELD workflow.
[ S = \frac{N_g}{N_e} ]
Stickiness, in comparison to graduation rate, has these characteristics:
Includes migrators, where graduation rate does not.
Is based on the bloc of ever enrolled rather than starters, so there is no need for FYE proxies.
Counts all graduates (timely completers) in a program, eliminating the need to filter graduates based on their starting program.
Like the MIDFIELD definition of graduation rate (in contrast to the IPEDS definition), includes students who attend college part-time, who transfer between institutions, and who start in any term.
As they pertain to the stickiness metric, relationships among starters, migrators, and graduates (timely completers) of a given program P are illustrated in Figure 1.
The interior rectangle represents the stickiness numerator $(N_g)$, the set of graduates (timely completers) of program P.
The overall rectangle represents the stickiness denominator $(N_e)$, the set of students ever enrolled in program P.
#| echo: false #| label: fig01 #| fig-width: 12 #| fig-asp: 0.7 #| fig-cap: "Figure 1. Stickiness metric. Starters, migrators, and timely completers." df_tile <- data.frame( x = rep(c(2, 4), 2), # centerline of rectangle y = rep(c(1), each = 2), # centerline z = factor(rep(1:2)) ) delta <- 0.02 # x-position, center of circled numbers c1 <- 2.57 c2 <- 1.75 # 1.55 c3 <- 4 # 3.43 c4 <- 4.25 df_box1 <- data.frame( x = c(1, 5, 5) + delta * c(-1, 1, 1), y = c(0.5, 0.5, 1.5) + delta / 2 * c(-1, -1, 1) ) df_box2 <- data.frame( x = c(1, 1, 5) + delta * c(-1, -1, 1), y = c(0.5, 1.5, 1.5) + delta / 2 * c(-1, 1, 1) ) df_circ <- data.frame(x = c(c1, c2, c3, c4), y = c(1, 1, 1, 1.35)) df_hash <- data.frame( x = c(3, 3), xend = c(5, 5), y = c(0.5, 1.5), yend = c(1.5, 0.5) ) df_circ2 <- data.frame(x = 4, y = 0.99) ggplot(df_tile, aes(x, y)) + geom_tile(aes(fill = z)) + scale_x_continuous(breaks = seq(0, 16, 2)) + scale_fill_manual( values = c("#80cdc1", "#80cdc1"), # "#dfc27d" "#80cdc1" aesthetics = c("colour", "fill") ) + geom_vline(aes(xintercept = 3), color = "white") + geom_point( data = data.frame(x = 3, y = 0.95), aes(x = x, y = y), shape = 22, size = 190, color = "white", fill = "white", alpha = 0.4 ) + theme_void() + theme(legend.position = "none") + geom_line(data = df_box1, aes(x = x, y = y), linewidth = 1, linetype = 2) + geom_line(data = df_box2, aes(x = x, y = y), linewidth = 1, linetype = 2) + scale_y_continuous(limits = c(0.4, 1.6)) + annotate("text", x = c(c1, c2, c3, c4, 3, 3), y = c(0.925, 1.45, 0.925, 1.45, 1.27, 1.55), # 1.37 label = c( "", # starter-completers "starters in program P", "", # migrator-completers "migrators into program P", "timely completers of program P", "ever enrolled in program P" ), hjust = 0.5, vjust = 0.5, size = 6 )
Demonstrating the following elements of a MIDFIELD workflow.
Planning. The metric is stickiness. Required blocs are ever-enrolled and graduates. Grouping variables are program, race/ethnicity, and sex. Programs are the four Engineering programs used throughout.
Initial processing. Filter the student-level records for data sufficiency and degree-seeking.
Blocs. Gather ever enrolled, filter by program. Gather graduates, filter by program.
Groupings. Add grouping variables.
Metrics Summarize by grouping variables and compute stickiness.
Displays Create multiway chart and results table.
Start. If you are writing your own script to follow along, we use these packages in this article:
library(midfieldr) library(midfielddata) library(data.table) library(ggplot2)
Load. Practice datasets. View data dictionaries via ?student
, ?term
, ?degree
.
# Load practice data data(student, term, degree)
Loads with midfieldr. Prepared data. View data dictionary via ?study_programs
, ?baseline_mcid
.
Select (optional). Reduce the number of columns. Code reproduced from Getting started.
# Optional. Copy of source files with all variables source_student <- copy(student) source_term <- copy(term) source_degree <- copy(degree) # Optional. Select variables required by midfieldr functions student <- select_required(source_student) term <- select_required(source_term) degree <- select_required(source_degree)
# Working data frame DT <- copy(baseline_mcid)
Ever enrolled. The summary code chunk from Blocs.
# Ever-enrolled bloc DT <- term[DT, .(mcid, cip6), on = c("mcid")] DT <- unique(DT) # Filter by program DT <- study_programs[DT, on = c("cip6"), nomatch = NULL] DT[, cip6 := NULL] DT <- unique(DT) DT
Copy. To prepare for joining with graduates.
# Prepare for joining setcolorder(DT, c("mcid")) ever_enrolled <- copy(DT) ever_enrolled
Initialize. The data frame of baseline IDs is the intake for this section.
# Working data frame DT <- copy(baseline_mcid)
Graduates The summary code chunk from Graduates
# Gather graduates and their degree CIPs DT <- add_timely_term(DT, term) DT <- add_completion_status(DT, degree) DT <- DT[completion_status == "timely"] DT <- degree[DT, .(mcid, term_degree, cip6), on = c("mcid")] # Filter by program and first-degree terms only DT <- study_programs[DT, on = c("cip6"), nomatch = NULL] DT <- DT[, .SD[which.min(term_degree)], by = "mcid"] DT[, c("cip6", "term_degree") := NULL] DT <- unique(DT) DT
Copy. To prepare for joining with ever enrolled
# Prepare for joining setcolorder(DT, c("mcid")) graduates <- copy(DT) graduates
One of our grouping variables (program
) is already included in the data frames. The next grouping variable is bloc
to distinguish starters from graduates when the two data frames are combined.
Add a variable. Label ever enrolled and graduates.
# For grouping by bloc ever_enrolled[, bloc := "ever_enrolled"] graduates[, bloc := "graduates"]
Join. Combine the two blocs to prepare for summarizing. A graduate has two observations in these data: one as ever enrolled and one as a graduate.
# Prepare for summarizing DT <- rbindlist(list(ever_enrolled, graduates)) DT
Add variables. Demographics from Groupings
# Join race/ethnicity and sex cols_we_want <- student[, .(mcid, race, sex)] DT <- cols_we_want[DT, on = c("mcid")] DT
Verify prepared data. study_observations
, included with midfieldr, contains the case study information developed above. Here we verify that the two data frames have the same content.
# Demonstrate equivalence check_equiv_frames(DT, study_observations)
#| eval: false #| echo: false # Run manually # Writing external files setkey(DT, mcid) setkey(DT, NULL) study_observations <- copy(DT) usethis::use_data(study_observations, overwrite = TRUE)
Summarize. Count the numbers of observations for each combination of the grouping variables.
# Count observations by group grouping_variables <- c("bloc", "program", "sex", "race") DT <- DT[, .N, by = grouping_variables] setorderv(DT, grouping_variables) DT
Reshape. Transform to row-record form to set up the stickiness metric calculation. Transform the N column into two columns, one for ever-enrolled and one for graduates.
# Prepare to compute metric DT <- dcast(DT, program + sex + race ~ bloc, value.var = "N", fill = 0) DT
Create a variable. Compute the metric.
# Compute metric DT[, stickiness := round(100 * graduates / ever_enrolled, 1)] DT
Verify prepared data. study_results
, included with midfieldr, contains the case study information developed above. Here we verify that the two data frames have the same content.
# Demonstrate equivalence check_equiv_frames(DT, study_results)
#| eval: false #| echo: false # Run manually # Writing external files setkey(DT, program, sex, race) setkey(DT, NULL) study_results <- copy(DT) usethis::use_data(study_results, overwrite = TRUE)
Filter. To preserve the anonymity of the people involved, we remove observations with fewer than N_threshold
graduates. With the research data, we typically set this threshold to 10; with the practice data, we demonstrate the procedure using a threshold of 5.
# Preserve anonymity N_threshold <- 5 # 10 for research data DT <- DT[graduates >= N_threshold] DT
Recode. Readers can more readily interpret our charts and tables if the programs are unabbreviated.
# Recode values for chart and table readability DT[, program := fcase( program %like% "CE", "Civil", program %like% "EE", "Electrical", program %like% "ME", "Mechanical", program %like% "ISE", "Industrial/Systems" )] DT
Add a variable. We combine race/ethnicity and sex to create a combined grouping variable.
# Create a combined category DT[, people := paste(race, sex)] DT[, `:=`(race = NULL, sex = NULL)] setcolorder(DT, c("program", "people")) DT
Order factors. Order the levels of the categories. Code adapted from Multiway data and charts.
# Order the categories DT <- order_multiway(DT, quantity = "stickiness", categories = c("program", "people"), method = "percent", ratio_of = c("graduates", "ever_enrolled") ) DT
Multiway chart. Code adapted from Multiway data and charts.
The vertical reference line is the aggregate stickiness of the program, independent of race/ethnicity and sex. A missing data marker or missing group indicates the number of graduates was below the threshold set to preserve anonymity---largely an artifact of applying these groupings to practice data.
#| label: fig02 #| fig-asp: 1.1 #| fig-cap: "Figure 2: Stickiness of four Engineering majors." ggplot(DT, aes(x = stickiness, y = people)) + facet_wrap(vars(program), ncol = 1, as.table = FALSE) + geom_vline(aes(xintercept = program_stickiness), linetype = 2, color = "gray60") + geom_point() + labs(x = "Stickiness (%)", y = "") + scale_x_continuous(limits = c(20, 90), breaks = seq(0, 100, 10))
Results table. Code adapted from Multiway data and charts.
# Select variables and remove factors display_table <- copy(DT) display_table <- display_table[, .(program, people, stickiness)] display_table[, people := as.character(people)] display_table[, program := as.character(program)] # Construct table display_table <- dcast(display_table, people ~ program, value.var = "stickiness") setnames(display_table, old = c("people"), new = c("People"), skip_absent = TRUE ) display_table
(Optional) Format the table nearer to publication quality. Here I use the 'gt' package.
library(gt) display_table |> gt() |> tab_caption("Table 1: Stickiness (%) of four Engineering majors") |> tab_options(table.font.size = "small") |> opt_stylize(style = 1, color = "gray") |> tab_style( style = list(cell_fill(color = "#c7eae5")), locations = cells_column_labels(columns = everything()) )
A value of NA indicates a group removed because the number of graduates was below the threshold set to preserve anonymity. As noted earlier, these are largely an artifact of applying these groupings to practice data.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.