In MIDFIELDR/midfieldr: Tools and Methods for Working with MIDFIELD Data in 'R'

#| include: false
knitr::opts_chunk$set(fig.path = "../man/figures/art-110-")
library(ggplot2)

Stickiness is a more-inclusive alternative to graduation rate as a measure of a program's success in attracting, keeping, and graduating their undergraduates. All students excluded by a conventional graduation rate metric--including migrators---are included in the stickiness metric [@Ohland+Orr+others:2012].

This vignette in the MIDFIELD workflow.

Planning
Initial processing
Blocs
Groupings
[Metrics]{.accent}
- Graduation rate
- [Stickiness]{.accent}
Displays

Definitions

[ S = \frac{N_g}{N_e} ]

A more inclusive metric

Stickiness, in comparison to graduation rate, has these characteristics:

Includes migrators, where graduation rate does not.
Is based on the bloc of ever enrolled rather than starters, so there is no need for FYE proxies.
Counts all graduates (timely completers) in a program, eliminating the need to filter graduates based on their starting program.
Like the MIDFIELD definition of graduation rate (in contrast to the IPEDS definition), includes students who attend college part-time, who transfer between institutions, and who start in any term.

As they pertain to the stickiness metric, relationships among starters, migrators, and graduates (timely completers) of a given program P are illustrated in Figure 1.

The interior rectangle represents the stickiness numerator $(N_g)$, the set of graduates (timely completers) of program P.
The overall rectangle represents the stickiness denominator $(N_e)$, the set of students ever enrolled in program P.

#| echo: false
#| label: fig01
#| fig-width: 12
#| fig-asp: 0.7
#| fig-cap: "Figure 1. Stickiness metric. Starters, migrators, and timely completers."

df_tile <- data.frame(
  x = rep(c(2, 4), 2), # centerline of rectangle
  y = rep(c(1), each = 2), # centerline
  z = factor(rep(1:2))
)

delta <- 0.02

# x-position, center of circled numbers
c1 <- 2.57
c2 <- 1.75 # 1.55
c3 <- 4 # 3.43
c4 <- 4.25

df_box1 <- data.frame(
  x = c(1, 5, 5) + delta * c(-1, 1, 1),
  y = c(0.5, 0.5, 1.5) + delta / 2 * c(-1, -1, 1)
)
df_box2 <- data.frame(
  x = c(1, 1, 5) + delta * c(-1, -1, 1),
  y = c(0.5, 1.5, 1.5) + delta / 2 * c(-1, 1, 1)
)
df_circ <- data.frame(x = c(c1, c2, c3, c4), y = c(1, 1, 1, 1.35))
df_hash <- data.frame(
  x = c(3, 3), xend = c(5, 5),
  y = c(0.5, 1.5), yend = c(1.5, 0.5)
)
df_circ2 <- data.frame(x = 4, y = 0.99)

ggplot(df_tile, aes(x, y)) +
  geom_tile(aes(fill = z)) +
  scale_x_continuous(breaks = seq(0, 16, 2)) +
  scale_fill_manual(
    values = c("#80cdc1", "#80cdc1"), # "#dfc27d"  "#80cdc1"
    aesthetics = c("colour", "fill")
  ) +
  geom_vline(aes(xintercept = 3), color = "white") +
  geom_point(
    data = data.frame(x = 3, y = 0.95),
    aes(x = x, y = y),
    shape = 22,
    size = 190,
    color = "white",
    fill = "white",
    alpha = 0.4
  ) +
  theme_void() +
  theme(legend.position = "none") +
  geom_line(data = df_box1, aes(x = x, y = y), linewidth = 1, linetype = 2) +
  geom_line(data = df_box2, aes(x = x, y = y), linewidth = 1, linetype = 2) +
  scale_y_continuous(limits = c(0.4, 1.6)) +
  annotate("text",
    x = c(c1, c2, c3, c4, 3, 3),
    y = c(0.925, 1.45, 0.925, 1.45, 1.27, 1.55), # 1.37
    label = c(
      "", # starter-completers
      "starters in program P",
      "", #  migrator-completers
      "migrators into program P",
      "timely completers of program P",
      "ever enrolled in program P"
    ),
    hjust = 0.5,
    vjust = 0.5,
    size = 6
  )

Method

Demonstrating the following elements of a MIDFIELD workflow.

Planning. The metric is stickiness. Required blocs are ever-enrolled and graduates. Grouping variables are program, race/ethnicity, and sex. Programs are the four Engineering programs used throughout.
Initial processing. Filter the student-level records for data sufficiency and degree-seeking.
Blocs. Gather ever enrolled, filter by program. Gather graduates, filter by program.
Groupings. Add grouping variables.
Metrics Summarize by grouping variables and compute stickiness.
Displays Create multiway chart and results table.

Load data

Start. If you are writing your own script to follow along, we use these packages in this article:

library(midfieldr)
library(midfielddata)
library(data.table)
library(ggplot2)

Load. Practice datasets. View data dictionaries via ?student, ?term, ?degree.

# Load practice data
data(student, term, degree)

Loads with midfieldr. Prepared data. View data dictionary via ?study_programs, ?baseline_mcid.

study_programs (derived in Programs).
baseline_mcid (derived in Blocs).

Initial processing

Select (optional). Reduce the number of columns. Code reproduced from Getting started.

# Optional. Copy of source files with all variables
source_student <- copy(student)
source_term <- copy(term)
source_degree <- copy(degree)

# Optional. Select variables required by midfieldr functions
student <- select_required(source_student)
term <- select_required(source_term)
degree <- select_required(source_degree)

# Working data frame
DT <- copy(baseline_mcid)

Ever enrolled

Ever enrolled. The summary code chunk from Blocs.

# Ever-enrolled bloc
DT <- term[DT, .(mcid, cip6), on = c("mcid")]
DT <- unique(DT)

# Filter by program
DT <- study_programs[DT, on = c("cip6"), nomatch = NULL]
DT[, cip6 := NULL]
DT <- unique(DT)
DT

Copy. To prepare for joining with graduates.

# Prepare for joining
setcolorder(DT, c("mcid"))
ever_enrolled <- copy(DT)
ever_enrolled

Graduates

Initialize. The data frame of baseline IDs is the intake for this section.

# Working data frame
DT <- copy(baseline_mcid)

Graduates The summary code chunk from Graduates

# Gather graduates and their degree CIPs
DT <- add_timely_term(DT, term)
DT <- add_completion_status(DT, degree)
DT <- DT[completion_status == "timely"]
DT <- degree[DT, .(mcid, term_degree, cip6), on = c("mcid")]

# Filter by program and first-degree terms only
DT <- study_programs[DT, on = c("cip6"), nomatch = NULL]
DT <- DT[, .SD[which.min(term_degree)], by = "mcid"]
DT[, c("cip6", "term_degree") := NULL]
DT <- unique(DT)
DT

Copy. To prepare for joining with ever enrolled

# Prepare for joining
setcolorder(DT, c("mcid"))
graduates <- copy(DT)
graduates

Groupings

One of our grouping variables (program) is already included in the data frames. The next grouping variable is bloc to distinguish starters from graduates when the two data frames are combined.

Add a variable. Label ever enrolled and graduates.

# For grouping by bloc
ever_enrolled[, bloc := "ever_enrolled"]
graduates[, bloc := "graduates"]

Join. Combine the two blocs to prepare for summarizing. A graduate has two observations in these data: one as ever enrolled and one as a graduate.

# Prepare for summarizing
DT <- rbindlist(list(ever_enrolled, graduates))
DT

Add variables. Demographics from Groupings

# Join race/ethnicity and sex
cols_we_want <- student[, .(mcid, race, sex)]
DT <- cols_we_want[DT, on = c("mcid")]
DT

Verify prepared data. study_observations, included with midfieldr, contains the case study information developed above. Here we verify that the two data frames have the same content.

# Demonstrate equivalence
check_equiv_frames(DT, study_observations)

#| eval: false
#| echo: false

# Run manually
# Writing external files
setkey(DT, mcid)
setkey(DT, NULL)
study_observations <- copy(DT)
usethis::use_data(study_observations, overwrite = TRUE)

Stickiness

Summarize. Count the numbers of observations for each combination of the grouping variables.

# Count observations by group
grouping_variables <- c("bloc", "program", "sex", "race")
DT <- DT[, .N, by = grouping_variables]
setorderv(DT, grouping_variables)
DT

Reshape. Transform to row-record form to set up the stickiness metric calculation. Transform the N column into two columns, one for ever-enrolled and one for graduates.

# Prepare to compute metric
DT <- dcast(DT, program + sex + race ~ bloc, value.var = "N", fill = 0)
DT

Create a variable. Compute the metric.

# Compute metric
DT[, stickiness := round(100 * graduates / ever_enrolled, 1)]
DT

Verify prepared data. study_results, included with midfieldr, contains the case study information developed above. Here we verify that the two data frames have the same content.

# Demonstrate equivalence
check_equiv_frames(DT, study_results)

#| eval: false
#| echo: false

# Run manually
# Writing external files
setkey(DT, program, sex, race)
setkey(DT, NULL)
study_results <- copy(DT)
usethis::use_data(study_results, overwrite = TRUE)

Prepare for dissemination

Filter. To preserve the anonymity of the people involved, we remove observations with fewer than N_threshold graduates. With the research data, we typically set this threshold to 10; with the practice data, we demonstrate the procedure using a threshold of 5.

# Preserve anonymity
N_threshold <- 5 # 10 for research data
DT <- DT[graduates >= N_threshold]
DT

Recode. Readers can more readily interpret our charts and tables if the programs are unabbreviated.

# Recode values for chart and table readability
DT[, program := fcase(
  program %like% "CE", "Civil",
  program %like% "EE", "Electrical",
  program %like% "ME", "Mechanical",
  program %like% "ISE", "Industrial/Systems"
)]
DT

Add a variable. We combine race/ethnicity and sex to create a combined grouping variable.

# Create a combined category
DT[, people := paste(race, sex)]
DT[, `:=`(race = NULL, sex = NULL)]
setcolorder(DT, c("program", "people"))
DT

Chart

Order factors. Order the levels of the categories. Code adapted from Multiway data and charts.

# Order the categories
DT <- order_multiway(DT,
  quantity   = "stickiness",
  categories = c("program", "people"),
  method     = "percent",
  ratio_of   = c("graduates", "ever_enrolled")
)
DT

Multiway chart. Code adapted from Multiway data and charts.

The vertical reference line is the aggregate stickiness of the program, independent of race/ethnicity and sex. A missing data marker or missing group indicates the number of graduates was below the threshold set to preserve anonymity---largely an artifact of applying these groupings to practice data.

#| label: fig02
#| fig-asp: 1.1
#| fig-cap: "Figure 2: Stickiness of four Engineering majors."

ggplot(DT, aes(x = stickiness, y = people)) +
  facet_wrap(vars(program), ncol = 1, as.table = FALSE) +
  geom_vline(aes(xintercept = program_stickiness), linetype = 2, color = "gray60") +
  geom_point() +
  labs(x = "Stickiness (%)", y = "") +
  scale_x_continuous(limits = c(20, 90), breaks = seq(0, 100, 10))

Table

Results table. Code adapted from Multiway data and charts.

# Select variables and remove factors
display_table <- copy(DT)
display_table <- display_table[, .(program, people, stickiness)]
display_table[, people := as.character(people)]
display_table[, program := as.character(program)]

# Construct table
display_table <- dcast(display_table, people ~ program, value.var = "stickiness")
setnames(display_table,
  old = c("people"),
  new = c("People"),
  skip_absent = TRUE
)
display_table

(Optional) Format the table nearer to publication quality. Here I use the 'gt' package.

library(gt)
display_table |>
  gt() |>
  tab_caption("Table 1: Stickiness (%) of four Engineering majors") |>
  tab_options(table.font.size = "small") |>
  opt_stylize(style = 1, color = "gray") |>
  tab_style(
    style = list(cell_fill(color = "#c7eae5")),
    locations = cells_column_labels(columns = everything())
  )

A value of NA indicates a group removed because the number of graduates was below the threshold set to preserve anonymity. As noted earlier, these are largely an artifact of applying these groupings to practice data.

References

MIDFIELDR/midfieldr documentation built on Jan. 28, 2025, 10:24 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MIDFIELDR/midfieldr
Tools and Methods for Working with MIDFIELD Data in 'R'

In MIDFIELDR/midfieldr: Tools and Methods for Working with MIDFIELD Data in 'R'

Definitions

A more inclusive metric

Method

Load data

Initial processing

Ever enrolled

Graduates

Groupings

Stickiness

Prepare for dissemination

Chart

Table

References

R Package Documentation

Browse R Packages

We want your feedback!

MIDFIELDR/midfieldr Tools and Methods for Working with MIDFIELD Data in 'R'

In MIDFIELDR/midfieldr: Tools and Methods for Working with MIDFIELD Data in 'R'

Definitions

A more inclusive metric

Method

Load data

Initial processing

Ever enrolled

Graduates

Groupings

Stickiness

Prepare for dissemination

Chart

Table

References

R Package Documentation

Browse R Packages

We want your feedback!

MIDFIELDR/midfieldr
Tools and Methods for Working with MIDFIELD Data in 'R'