#| include: false # code chunks knitr::opts_chunk$set( fig.width = 8, out.width = "100%", collapse = TRUE, comment = "#>", message = FALSE, cache = FALSE, error = FALSE, tidy = FALSE, echo = TRUE ) # inline numbers knitr::knit_hooks$set(inline = function(x) { if (!is.numeric(x)) { x } else if (x >= 10000) { prettyNum(round(x, 2), big.mark = ",") } else { prettyNum(round(x, 2)) } }) # accented text accent <- function(text_string) { kableExtra::text_spec(text_string, color = "#b35806", bold = TRUE) } # Backup user options (load packages to capture default options) suppressPackageStartupMessages(library(data.table)) backup_options <- options() # Backup user random number seed oldseed <- NULL if (exists(".Random.seed")) oldseed <- .Random.seed # data.table printout options( datatable.print.nrows = 10, datatable.print.topn = 3, datatable.print.class = FALSE )
midfielddata
is an R data package that supplies anonymized student-level records for 98,000 undergraduates from the MIDFIELD database. Provides practice data for the tools and methods of midfieldr
.
Data at the "student-level" refers to information collected by undergraduate institutions on individual students, including:
midfielddata
provides anonymized student-level records for 98,000 undergraduates at three US institutions from 1988 through 2018, collected in four data tables keyed by student ID.
#| echo: false wrapr::build_frame( "Dataset", "Each row is", "Students", "Rows", "Columns", "Memory" | "course", "one student per course", "97,555", "3,289,532", 12L, "324.3 MB" | "term", "one student per term", "97,555", "639,915", 13L, "72.8 MB" | "student", "one student", "97,555", "97,555", 13L, "17.3 MB" | "degree", "one student per degree", "49,543", "49,665", 5L, "5.2 MB" ) |> kableExtra::kbl(align = "llrrrr", caption = "Table 1. Practice datasets in `midfielddata`.") |> kableExtra::kable_paper(lightable_options = "basic", full_width = FALSE) |> kableExtra::row_spec(0, background = "#c7eae5") |> kableExtra::column_spec(1, monospace = TRUE) |> kableExtra::column_spec(1:6, color = "black", background = "white")
The data in midfielddata
are a proportionate stratified sample of the MIDFIELD database, but are not suitable for drawing inferences about program attributes or student experiences---midfielddata
are for practice, not research.
Notes on syntax. We use data.table
for data manipulation. Some users may prefer base R or dplyr
. Each system has its strengths---users are welcome to translate our examples to their preferred syntax.
format(Sys.Date(), "%Y-%m-%d") # Today's date packageVersion("midfielddata") # Student-level records practice data packageVersion("data.table") # For data manipulation
Start. If you are writing your own script to follow along, we use these packages in this vignette:
library(midfielddata) library(data.table)
Load data tables. Data tables can be loaded individually or collectively as needed.
# Load one table as needed data(student) # Or load multiple tables data(course, term, degree)
We display the records for one specific student, using their ID to subset each dataset.
# One student ID id_we_want <- "MCID3112192438"
Student. As expected, student
yields one row per student.
# Observations for a selected ID student[mcid == id_we_want]
Course. For this student, the records span r nrow(course[mcid == id_we_want])
rows, one row per course.
# Observations for a selected ID course[mcid == id_we_want]
Term. Here, the records span r nrow(term[mcid == id_we_want])
rows, one row per term.
# Observations for a selected ID term[mcid == id_we_want]
Degree. In this example, the records span r nrow(degree[mcid == id_we_want])
rows, one row per degree. The degrees were earned in the same term, Spring 2009.
# Observations for a selected ID degree[mcid == id_we_want]
Not all students with more than one degree earn them in the same term. For example, the next student earned a degree in 1996 and a second degree in 1999. In most analyses, only the first baccalaureate degree would be used.
# Observations for a different ID degree[mcid == "MCID3111315508"]
#| include: false # Find duplicate IDs in degree # DT <- copy(degree) # idx <- which(duplicated(DT[, .(mcid)])) # dup_ID <- DT[idx, .(mcid)] # dup_degree <- dup_ID[degree, on = "mcid", nomatch = NULL] # dup_degree
Install with:
#| eval: false install.packages("midfielddata", repos = "https://MIDFIELDR.github.io/drat/", type = "source" )
The installed size of midfielddata
is about 24 Mb, so installation will take longer than that of a conventional CRAN package. Also because of its size, the package is not hosted on CRAN (with its 5 MB size limit)---instead, we host it on the MIDFIELDR drat
repository as indicated above.
Link to installation instructions for midfieldr
below.
: A companion R package that provides tools and methods for studying undergraduate student-level records from the MIDFIELD database.
: A database of anonymized student-level records for approximately 2.4M undergraduates at 21 US institutions from 1987-2022. Access to this database requires a confidentiality agreement and Institutional Review Board (IRB) approval for human subjects research. For a detailed description of the database, see [@aee2016].
This work was supported by the US National Science Foundation through grant numbers 1545667 and 2142087.
#| echo: false # Restore the user options (saved in common-setup.Rmd) options(backup_options) # Restore user random number seed if any if (!is.null(oldseed)) { .Random.seed <- oldseed } # to change the CSS file # per https://github.com/rstudio/rmarkdown/issues/732 knitr::opts_chunk$set(echo = FALSE)
blockquote { padding: 10px 20px; margin: 0 0 20px; border-left: 0px } caption { color: #525252; text-align: left; font-weight: normal; font-size: medium; line-height: 1.5; }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.