#| include: false
knitr::opts_chunk$set(fig.path = "../man/figures/art-110-")
library(ggplot2)

Stickiness is a more-inclusive alternative to graduation rate as a measure of a program's success in attracting, keeping, and graduating their undergraduates. All students excluded by a conventional graduation rate metric--including migrators---are included in the stickiness metric [@Ohland+Orr+others:2012].

This vignette in the MIDFIELD workflow.

  1. Planning
  2. Initial processing
  3. Blocs
  4. Groupings
  5. [Metrics]{.accent}
    • Graduation rate
    • [Stickiness]{.accent}
  6. Displays

Definitions


[ S = \frac{N_g}{N_e} ]






A more inclusive metric

Stickiness, in comparison to graduation rate, has these characteristics:

As they pertain to the stickiness metric, relationships among starters, migrators, and graduates (timely completers) of a given program P are illustrated in Figure 1.

#| echo: false
#| label: fig01
#| fig-width: 12
#| fig-asp: 0.7
#| fig-cap: "Figure 1. Stickiness metric. Starters, migrators, and timely completers."

df_tile <- data.frame(
  x = rep(c(2, 4), 2), # centerline of rectangle
  y = rep(c(1), each = 2), # centerline
  z = factor(rep(1:2))
)

delta <- 0.02

# x-position, center of circled numbers
c1 <- 2.57
c2 <- 1.75 # 1.55
c3 <- 4 # 3.43
c4 <- 4.25

df_box1 <- data.frame(
  x = c(1, 5, 5) + delta * c(-1, 1, 1),
  y = c(0.5, 0.5, 1.5) + delta / 2 * c(-1, -1, 1)
)
df_box2 <- data.frame(
  x = c(1, 1, 5) + delta * c(-1, -1, 1),
  y = c(0.5, 1.5, 1.5) + delta / 2 * c(-1, 1, 1)
)
df_circ <- data.frame(x = c(c1, c2, c3, c4), y = c(1, 1, 1, 1.35))
df_hash <- data.frame(
  x = c(3, 3), xend = c(5, 5),
  y = c(0.5, 1.5), yend = c(1.5, 0.5)
)
df_circ2 <- data.frame(x = 4, y = 0.99)

ggplot(df_tile, aes(x, y)) +
  geom_tile(aes(fill = z)) +
  scale_x_continuous(breaks = seq(0, 16, 2)) +
  scale_fill_manual(
    values = c("#80cdc1", "#80cdc1"), # "#dfc27d"  "#80cdc1"
    aesthetics = c("colour", "fill")
  ) +
  geom_vline(aes(xintercept = 3), color = "white") +
  geom_point(
    data = data.frame(x = 3, y = 0.95),
    aes(x = x, y = y),
    shape = 22,
    size = 190,
    color = "white",
    fill = "white",
    alpha = 0.4
  ) +
  theme_void() +
  theme(legend.position = "none") +
  geom_line(data = df_box1, aes(x = x, y = y), linewidth = 1, linetype = 2) +
  geom_line(data = df_box2, aes(x = x, y = y), linewidth = 1, linetype = 2) +
  scale_y_continuous(limits = c(0.4, 1.6)) +
  annotate("text",
    x = c(c1, c2, c3, c4, 3, 3),
    y = c(0.925, 1.45, 0.925, 1.45, 1.27, 1.55), # 1.37
    label = c(
      "", # starter-completers
      "starters in program P",
      "", #  migrator-completers
      "migrators into program P",
      "timely completers of program P",
      "ever enrolled in program P"
    ),
    hjust = 0.5,
    vjust = 0.5,
    size = 6
  )

Method

Demonstrating the following elements of a MIDFIELD workflow.

  1. Planning.   The metric is stickiness. Required blocs are ever-enrolled and graduates. Grouping variables are program, race/ethnicity, and sex. Programs are the four Engineering programs used throughout.

  2. Initial processing.   Filter the student-level records for data sufficiency and degree-seeking.

  3. Blocs.   Gather ever enrolled, filter by program. Gather graduates, filter by program.

  4. Groupings.   Add grouping variables.

  5. Metrics   Summarize by grouping variables and compute stickiness.

  6. Displays   Create multiway chart and results table.


Load data

Start.   If you are writing your own script to follow along, we use these packages in this article:

library(midfieldr)
library(midfielddata)
library(data.table)
library(ggplot2)

Load.   Practice datasets. View data dictionaries via ?student, ?term, ?degree.

# Load practice data
data(student, term, degree)

Loads with midfieldr.   Prepared data. View data dictionary via ?study_programs, ?baseline_mcid.

Initial processing

Select (optional).   Reduce the number of columns. Code reproduced from Getting started.

# Optional. Copy of source files with all variables
source_student <- copy(student)
source_term <- copy(term)
source_degree <- copy(degree)

# Optional. Select variables required by midfieldr functions
student <- select_required(source_student)
term <- select_required(source_term)
degree <- select_required(source_degree)

# Working data frame
DT <- copy(baseline_mcid)

Ever enrolled

Ever enrolled.   The summary code chunk from Blocs.

# Ever-enrolled bloc
DT <- term[DT, .(mcid, cip6), on = c("mcid")]
DT <- unique(DT)

# Filter by program
DT <- study_programs[DT, on = c("cip6"), nomatch = NULL]
DT[, cip6 := NULL]
DT <- unique(DT)
DT

Copy.   To prepare for joining with graduates.

# Prepare for joining
setcolorder(DT, c("mcid"))
ever_enrolled <- copy(DT)
ever_enrolled

Graduates

Initialize.   The data frame of baseline IDs is the intake for this section.

# Working data frame
DT <- copy(baseline_mcid)

Graduates   The summary code chunk from Graduates

# Gather graduates and their degree CIPs
DT <- add_timely_term(DT, term)
DT <- add_completion_status(DT, degree)
DT <- DT[completion_status == "timely"]
DT <- degree[DT, .(mcid, term_degree, cip6), on = c("mcid")]

# Filter by program and first-degree terms only
DT <- study_programs[DT, on = c("cip6"), nomatch = NULL]
DT <- DT[, .SD[which.min(term_degree)], by = "mcid"]
DT[, c("cip6", "term_degree") := NULL]
DT <- unique(DT)
DT

Copy.   To prepare for joining with ever enrolled

# Prepare for joining
setcolorder(DT, c("mcid"))
graduates <- copy(DT)
graduates

Groupings

One of our grouping variables (program) is already included in the data frames. The next grouping variable is bloc to distinguish starters from graduates when the two data frames are combined.

Add a variable.   Label ever enrolled and graduates.

# For grouping by bloc
ever_enrolled[, bloc := "ever_enrolled"]
graduates[, bloc := "graduates"]

Join.   Combine the two blocs to prepare for summarizing. A graduate has two observations in these data: one as ever enrolled and one as a graduate.

# Prepare for summarizing
DT <- rbindlist(list(ever_enrolled, graduates))
DT

Add variables.   Demographics from Groupings

# Join race/ethnicity and sex
cols_we_want <- student[, .(mcid, race, sex)]
DT <- cols_we_want[DT, on = c("mcid")]
DT

Verify prepared data.   study_observations, included with midfieldr, contains the case study information developed above. Here we verify that the two data frames have the same content.

# Demonstrate equivalence
check_equiv_frames(DT, study_observations)
#| eval: false
#| echo: false

# Run manually
# Writing external files
setkey(DT, mcid)
setkey(DT, NULL)
study_observations <- copy(DT)
usethis::use_data(study_observations, overwrite = TRUE)

Stickiness

Summarize.   Count the numbers of observations for each combination of the grouping variables.

# Count observations by group
grouping_variables <- c("bloc", "program", "sex", "race")
DT <- DT[, .N, by = grouping_variables]
setorderv(DT, grouping_variables)
DT

Reshape.   Transform to row-record form to set up the stickiness metric calculation. Transform the N column into two columns, one for ever-enrolled and one for graduates.

# Prepare to compute metric
DT <- dcast(DT, program + sex + race ~ bloc, value.var = "N", fill = 0)
DT

Create a variable.   Compute the metric.

# Compute metric
DT[, stickiness := round(100 * graduates / ever_enrolled, 1)]
DT

Verify prepared data.   study_results, included with midfieldr, contains the case study information developed above. Here we verify that the two data frames have the same content.

# Demonstrate equivalence
check_equiv_frames(DT, study_results)
#| eval: false
#| echo: false

# Run manually
# Writing external files
setkey(DT, program, sex, race)
setkey(DT, NULL)
study_results <- copy(DT)
usethis::use_data(study_results, overwrite = TRUE)

Prepare for dissemination

Filter.   To preserve the anonymity of the people involved, we remove observations with fewer than N_threshold graduates. With the research data, we typically set this threshold to 10; with the practice data, we demonstrate the procedure using a threshold of 5.

# Preserve anonymity
N_threshold <- 5 # 10 for research data
DT <- DT[graduates >= N_threshold]
DT

Recode.   Readers can more readily interpret our charts and tables if the programs are unabbreviated.

# Recode values for chart and table readability
DT[, program := fcase(
  program %like% "CE", "Civil",
  program %like% "EE", "Electrical",
  program %like% "ME", "Mechanical",
  program %like% "ISE", "Industrial/Systems"
)]
DT

Add a variable.   We combine race/ethnicity and sex to create a combined grouping variable.

# Create a combined category
DT[, people := paste(race, sex)]
DT[, `:=`(race = NULL, sex = NULL)]
setcolorder(DT, c("program", "people"))
DT

Chart

Order factors.   Order the levels of the categories. Code adapted from Multiway data and charts.

# Order the categories
DT <- order_multiway(DT,
  quantity   = "stickiness",
  categories = c("program", "people"),
  method     = "percent",
  ratio_of   = c("graduates", "ever_enrolled")
)
DT

Multiway chart.   Code adapted from Multiway data and charts.

The vertical reference line is the aggregate stickiness of the program, independent of race/ethnicity and sex. A missing data marker or missing group indicates the number of graduates was below the threshold set to preserve anonymity---largely an artifact of applying these groupings to practice data.

#| label: fig02
#| fig-asp: 1.1
#| fig-cap: "Figure 2: Stickiness of four Engineering majors."

ggplot(DT, aes(x = stickiness, y = people)) +
  facet_wrap(vars(program), ncol = 1, as.table = FALSE) +
  geom_vline(aes(xintercept = program_stickiness), linetype = 2, color = "gray60") +
  geom_point() +
  labs(x = "Stickiness (%)", y = "") +
  scale_x_continuous(limits = c(20, 90), breaks = seq(0, 100, 10))

Table

Results table.   Code adapted from Multiway data and charts.

# Select variables and remove factors
display_table <- copy(DT)
display_table <- display_table[, .(program, people, stickiness)]
display_table[, people := as.character(people)]
display_table[, program := as.character(program)]

# Construct table
display_table <- dcast(display_table, people ~ program, value.var = "stickiness")
setnames(display_table,
  old = c("people"),
  new = c("People"),
  skip_absent = TRUE
)
display_table

(Optional) Format the table nearer to publication quality. Here I use the 'gt' package.

library(gt)
display_table |>
  gt() |>
  tab_caption("Table 1: Stickiness (%) of four Engineering majors") |>
  tab_options(table.font.size = "small") |>
  opt_stylize(style = 1, color = "gray") |>
  tab_style(
    style = list(cell_fill(color = "#c7eae5")),
    locations = cells_column_labels(columns = everything())
  )

A value of NA indicates a group removed because the number of graduates was below the threshold set to preserve anonymity. As noted earlier, these are largely an artifact of applying these groupings to practice data.

References




MIDFIELDR/midfieldr documentation built on Jan. 28, 2025, 10:24 a.m.