In MIDFIELDR/midfieldr: Tools and Methods for Working with MIDFIELD Data in 'R'

#| include: false
knitr::opts_chunk$set(fig.path = "../man/figures/art-070-")

A degree-seeking student enrolled in their first degree-granting program is a starter in that program. Identifying starters is typically performed as part of a graduation rate calculation, though it can also be a useful measure in its own right.

This vignette in the MIDFIELD workflow.

Planning
Initial processing
[Blocs]{.accent}
- Ever-enrolled
- FYE proxies
- [Starters]{.accent}
- Graduates
Groupings
Metrics
Displays

Special cases

In two special cases, an entering student's CIP code does not correspond to a degree-granting program. Our procedure for identifying starters accommodate both special cases.

The first case includes records for which a CIP is unspecified or reported as "undecided". In MIDFIELD data, both conditions are encoded as CIP 999999. Students may enter with this CIP but we do not consider them starters until and if they enroll in a degree-granting program. (The midfielddata practice datasets contain no undecided CIP codes.)

The second case is more nuanced. At some US institutions, engineering students are required to complete a First-Year Engineering (FYE) program as a prerequisite for declaring an engineering major. These students are admitted as Engineering majors but we don't know to which degree-granting program they intended to transition. At the 2-digit CIP level, FYE students are starters in Engineering (CIP 14). If we do not restrict a study to 2-digit CIPs, however, we use FYE proxies---our estimates of the degree-granting engineering programs (6-digit CIP level) that FYE students would have declared had they not been required to enroll in FYE.

Definitions

entry term

: A student's first term in the database.

start term

: The first term in which a student can be considered a starter. Identical to the entry term unless the student enters as undecided/unspecified.

Method

We use student and term to identify starters.

Filter the source student-level records for data sufficiency and degree-seeking.
Filter for a student's first term not assigned an undecided/unknown CIP code.
Identify the program(s) of which a student can be considered a starter. Substitute an FYE proxy when a starting program is FYE.
Filter by program.

Load data

Start. If you are writing your own script to follow along, we use these packages in this article:

library(midfieldr)
library(midfielddata)
library(data.table)

Load. Practice datasets. View data dictionaries via ?student, ?term.

# Load practice data
data(student, term)

Loads with midfieldr. Prepared data. View data dictionaries via ?study_programs, ?baseline_mcid, ?fye_proxy.

study_programs (derived in Programs).
baseline_mcid (derived in Blocs).
fye_proxy (derived in FYE proxies).

Initial processing

Select (optional). Reduce the number of columns. Code reproduced from Getting started.

# Optional. Copy of source files with all variables
source_student <- copy(student)
source_term <- copy(term)

# Optional. Select variables required by midfieldr functions
student <- select_required(source_student)
term <- select_required(source_term)

# Working data frame
DT <- copy(baseline_mcid)

Isolate the start term

The start term is the first term in which a student can be considered a starter, that is, they are degree-seeking and not recorded as undecided/unspecified.

Add variables. Left join to add terms and CIPs for these students.

# Term into DT left join
DT <- term[DT, .(mcid, term, cip6), on = c("mcid")]
DT

Filter. We remove observations of undecided/unspecified (CIP 999999). Any rows remaining for the same IDs will have CIPs of degree-granting program (or FYE), allowing us to infer their preferred starting programs. (A required step for completeness, but unnecessary when using the practice data.)

# Remove undecided/unspecified
DT <- DT[!cip6 %like% "999999"]
DT

Filter. Order rows by ID and term, then filter to retain the start term observation. If your data contain students enrolled in more than one major in their first term, replace .SD[1] with the (slower) .SD[which.min(term)].

# Retain observations of the earliest remaining terms by ID
setorderv(DT, cols = c("mcid", "term"))
DT <- DT[, .SD[1], by = "mcid"]
DT

Filter. Remove unnecessary variables and filter for unique observations.

# Unique combinations of ID and CIP
DT <- DT[, .(mcid, cip6)]
DT <- unique(DT)
DT

Starters without FYE

If and only if our study excluded Engineering programs that require FYE, the data frame just derived would be the desired bloc of starters.

In such a case, we would rename cip6 to start to make explicit that the CIP codes in this column represent programs of which the students can be considered starters. The code would have the form

#| eval: false
# Not run
DT <- DT[, .(mcid, start = cip6)]

which retains the ID variable and changes the name of the CIP variable.

Starters with FYE

Add a variable. Merge fye_proxy with the working data frame. The left join introduces NA in the proxy column for students not assigned an FYE proxy.

# Join the proxies to the working data frame
DT <- fye_proxy[DT, on = c("mcid")]
DT

Create a variable. Estimated starting programs for FYE students are in the proxy column. Actual, recorded starting programs for non-FYE students are in the cip6 column. Create the start column to combine the two.

# Combine all starting CIPs
DT[, start := fcase(
  cip6 == "140102", proxy,
  cip6 != "140102", cip6
)]
DT

#| eval: false
#| echo: false
# Searching through the proxies for closer look cases
x <- DT[!is.na(proxy)]

for (jj in x$mcid) {
  y <- term[mcid == jj]
  y <- y[cip6 != "140102"]

  # y <- y[!cip6 %like% "^14"]
  #
  # if (nrow(y) != 0 ){
  #     print(y)
  #     profvis::pause(1)
  # }

  if (nrow(y) == 0) {
    print(term[mcid == jj])
    profvis::pause(1)
  }
}
DT[mcid == "MCID3111303095"]
term[mcid == "MCID3111303095"]

Select. Omit unnecessary columns.

# Omit unnecessary columns.
DT <- DT[, .(mcid, start)]
DT

Closer look

Examining the records of selected students in detail.

Example 1. In our results, this student is a starter in CIP 143501 (Industrial Engineering).

# Analysis result
DT[mcid == "MCID3111150194"]

An excerpt from their record in term shows them enrolled in CIP 140102 (FYE) for three terms followed by CIP 143501 They transitioned post-FYE to Industrial Engineering and we consider them a starter in that program.

# Sequence of term records
term[mcid == "MCID3111150194"]

Example 2. In our results, this student is a starter in CIP 141801 (Materials Engineering).

# Analysis result
DT[mcid == "MCID3111161837"]

An excerpt from their record in term shows them enrolled in CIP 140102 (FYE) for three terms followed by CIP 270101 (Mathematics)---they transitioned from FYE to a non-engineering major. Thus we consider them a starter in their proxy program, Materials Engineering.

op <- options()
options(datatable.print.nrows = 11)

# Sequence of term records
term[mcid == "MCID3111161837"]

options(op)

Example 3. In our results, this student is a starter in CIP 140701 (Chemical Engineering).

# Analysis result
DT[mcid == "MCID3111303095"]

An excerpt from their record in term shows them enrolled in CIP 140102 (FYE) for two terms and then leaving the database. Again, we consider them a starter in their proxy program, Chemical Engineering.

term[mcid == "MCID3111303095"]

Filter by program

Filter. Because "starter" usually means "starter in specific programs," this bloc concludes with a filter by program.

# Rename cip6 as start
join_labels <- copy(study_programs)
join_labels <- join_labels[, .(program, start = cip6)]

# Filter by program
DT <- join_labels[DT, on = c("start"), nomatch = NULL]
DT

Select. Omit unnecessary variables.

DT <- DT[, .(mcid, program)]
DT <- unique(DT)
DT

Reusable code

Preparation. The data frame of baseline IDs is the intake for this section.

DT <- copy(baseline_mcid)

Starters. Summary code chunks for ready reference.

# Isolate starting term
DT <- term[DT, .(mcid, term, cip6), on = c("mcid")]
DT <- DT[!cip6 %like% "999999"]
setorderv(DT, cols = c("mcid", "term"))
DT <- DT[, .SD[1], by = "mcid"]
# Alternatively
# DT <- DT[, .SD[which.min(term)], by = "mcid"]
DT <- DT[, .(mcid, cip6)]
DT <- unique(DT)

For starters without FYE, finish by renaming cip6

#| eval: false
# Not run
DT <- DT[, .(mcid, start = cip6)]

For starters with FYE, continue with FYE proxies.

DT <- fye_proxy[DT, .(mcid, cip6, proxy), on = c("mcid")]
DT[, start := fcase(
  cip6 == "140102", proxy,
  cip6 != "140102", cip6
)]
DT <- DT[, .(mcid, start)]

# Filter by program on start
join_labels <- copy(study_programs)
join_labels <- join_labels[, .(program, start = cip6)]
DT <- join_labels[DT, on = c("start"), nomatch = NULL]
DT <- DT[, .(mcid, program)]
DT <- unique(DT)

References

MIDFIELDR/midfieldr documentation built on Jan. 28, 2025, 10:24 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MIDFIELDR/midfieldr
Tools and Methods for Working with MIDFIELD Data in 'R'

In MIDFIELDR/midfieldr: Tools and Methods for Working with MIDFIELD Data in 'R'

Special cases

Definitions

Method

Load data

Initial processing

Isolate the start term

Starters without FYE

Starters with FYE

Closer look

Filter by program

Reusable code

References

R Package Documentation

Browse R Packages

We want your feedback!

MIDFIELDR/midfieldr Tools and Methods for Working with MIDFIELD Data in 'R'

In MIDFIELDR/midfieldr: Tools and Methods for Working with MIDFIELD Data in 'R'

Special cases

Definitions

Method

Load data

Initial processing

Isolate the start term

Starters without FYE

Starters with FYE

Closer look

Filter by program

Reusable code

References

R Package Documentation

Browse R Packages

We want your feedback!

MIDFIELDR/midfieldr
Tools and Methods for Working with MIDFIELD Data in 'R'