knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The educationdata package allows the user to retrieve data from the Urban
Institute's Education Data API as a
data.frame for analysis. The package contains one major function,
get_education_data, which will get data from a specified API endpoint and
return a data.frame to the user.
NOTE: By downloading and using this programming package, you agree to abide by the Data Policy and Terms of Use of the Education Data Portal. For more information, see https://educationdata.urban.org/documentation/#terms
The get_education_data function will return a data.frame from a call to
the Education Data API.
library(educationdata) get_education_data(level, source, topic, by, filters, add_labels, csv)
where:
list of grouping parameters for an API call.list query to filter the results from an API
call.FALSE.FALSE.This simple example will obtain 'college-university' level data from the
'ipeds' source for the 'student-faculty-ratio' topic:
library(educationdata) df <- get_education_data( level = 'college-university', source = 'ipeds', topic = 'student-faculty-ratio' ) head(df)
A somewhat more complex example will obtain 'school' level data from the
'ccd' source for the 'enrollment' topic, broken out by 'race' and 'sex'.
The API query is subset with filters for the 'year' 2008, 'grade' 9 through
12, and a 'ncessch' code of 340606000122. Finally, the add_labels flag will
map integer codes to their factor labels ('race' and 'sex' in this instance).
library(educationdata) df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment', by = list('race', 'sex'), filters = list(year = 2008, grade = 9:12, ncessch = '340606000122'), add_labels = TRUE) head(df)
source('../R/get-endpoint-info.R') df <- get_endpoint_info("https://educationdata.urban.org") df$years_available <- gsub('and' ,'', df$years_available) df$years_available <- gsub('\u20AC' ,'-', df$years_available) df$years_available <- gsub('\u00E2' ,'', df$years_available) df$years_available <- gsub('\u201C' ,'', df$years_available) df$optional_vars <- lapply(df$optional_vars, function(x) paste(x, collapse = ', ')) df$required_vars <- lapply(df$required_vars, function(x) paste(x, collapse = ', ')) df <- df[order(df$endpoint_url), ] vars <- c('section', 'class_name', 'topic', 'optional_vars', 'required_vars', 'years_available') knitr::kable(df[vars], col.names = c('Level', 'Source', 'Topic', 'By', 'Main Filters', 'Years Available'), row.names = FALSE)
Due to the way the API is set-up, the variables listed within 'main filters' are often the fastest way to subset an API call.
In addition to year, the other main filters for certain endpoints
accept the following values:
| Filter Argument | Grade |
|-------------------|-------|
| grade = 'grade-pk' | Pre-K |
| grade = 'grade-k' | Kindergarten |
| grade = 'grade-1' | Grade 1 |
| grade = 'grade-2' | Grade 2 |
| grade = 'grade-3' | Grade 3 |
| grade = 'grade-4' | Grade 4 |
| grade = 'grade-5' | Grade 5 |
| grade = 'grade-6' | Grade 6 |
| grade = 'grade-7' | Grade 7 |
| grade = 'grade-8' | Grade 8 |
| grade = 'grade-9' | Grade 9 |
| grade = 'grade-10' | Grade 10 |
| grade = 'grade-11' | Grade 11 |
| grade = 'grade-12' | Grade 12 |
| grade = 'grade-13' | Grade 13 |
| grade = 'grade-14' | Adult Education |
| grade = 'grade-15' | Ungraded |
| grade = 'grade-16' | K-12 |
| grade = 'grade-20' | Grades 7 and 8 |
| grade = 'grade-21' | Grade 9 and 10 |
| grade = 'grade-22' | Grades 11 and 12 |
| grade = 'grade-99' | Total |
| Filter Argument | Level of Study |
|-------------------|----------------|
| level_of_study = 'undergraduate' | Undergraduate |
| level_of_study = 'graduate' | Graduate |
| level_of_study = 'first-professional' | First Professional |
| level_of_study = 'post-baccalaureate' | Post-baccalaureate |
| level_of_study = '99' | Total |
Let's build up some examples, from the following set of endpoints.
df <- df[df$section == 'schools' & df$topic == 'enrollment', ] knitr::kable(df[vars], col.names = c('Level', 'Source', 'Topic', 'By', 'Main Filters', 'Years Available'), row.names = FALSE)
The following will return a data.frame across all years and grades:
library(educationdata) df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment')
Note that this endpoint is also callable by certain variables:
These variables can be added to the by argument:
df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment', by = list('race', 'sex'))
You may also filter the results of an API call. In this case year and
grade will provide the most time-efficient subsets, and can be vectorized:
df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment', by = list('race', 'sex'), filters = list(year = 1988:1990, grade = 6:8))
Additional variables can also be passed to filters to subset further:
df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment', by = list('race', 'sex'), filters = list(year = 1988:1990, grade = 6:8, ncessch = '010000200277'))
Finally, the add_labels flag will map variables to a factor from their
labels in the API.
df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment', by = list('race', 'sex'), filters = list(year = 1988:1990, grade = 6:8, ncessch = '010000200277'), add_labels = TRUE)
Finally, the csv flag can be set to download the full .csv data frame. In
general, the csv functionality is much faster when retrieving the full data
frame (or a large subset) and much slower when retrieving a small subset of a
data frame (especially ones with a lot of filters added). In this example,
the full csv for 2008 must be downloaded and then subset to the 96
observations.
df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment', by = list('race', 'sex'), filters = list(year = 1988:1990, grade = 6:8, ncessch = '010000200277'), add_labels = TRUE, csv = TRUE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.