knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The educationdata
package allows the user to retrieve data from the Urban
Institute's Education Data API as a
data.frame
for analysis. The package contains one major function,
get_education_data
, which will get data from a specified API endpoint and
return a data.frame
to the user.
NOTE: By downloading and using this programming package, you agree to abide by the Data Policy and Terms of Use of the Education Data Portal. For more information, see https://educationdata.urban.org/documentation/#terms
The get_education_data
function will return a data.frame
from a call to
the Education Data API.
library(educationdata) get_education_data(level, source, topic, by, filters, add_labels, csv)
where:
list
of grouping parameters for an API call.list
query to filter the results from an API
call.FALSE
.FALSE
.This simple example will obtain 'college-university' level
data from the
'ipeds' source
for the 'student-faculty-ratio' topic
:
library(educationdata) df <- get_education_data( level = 'college-university', source = 'ipeds', topic = 'student-faculty-ratio' ) head(df)
A somewhat more complex example will obtain 'school' level
data from the
'ccd' source
for the 'enrollment' topic
, broken out by
'race' and 'sex'.
The API query is subset with filters
for the 'year' 2008, 'grade' 9 through
12, and a 'ncessch' code of 340606000122. Finally, the add_labels
flag will
map integer codes to their factor labels ('race' and 'sex' in this instance).
library(educationdata) df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment', by = list('race', 'sex'), filters = list(year = 2008, grade = 9:12, ncessch = '340606000122'), add_labels = TRUE) head(df)
source('../R/get-endpoint-info.R') df <- get_endpoint_info("https://educationdata.urban.org") df$years_available <- gsub('and' ,'', df$years_available) df$years_available <- gsub('\u20AC' ,'-', df$years_available) df$years_available <- gsub('\u00E2' ,'', df$years_available) df$years_available <- gsub('\u201C' ,'', df$years_available) df$optional_vars <- lapply(df$optional_vars, function(x) paste(x, collapse = ', ')) df$required_vars <- lapply(df$required_vars, function(x) paste(x, collapse = ', ')) df <- df[order(df$endpoint_url), ] vars <- c('section', 'class_name', 'topic', 'optional_vars', 'required_vars', 'years_available') knitr::kable(df[vars], col.names = c('Level', 'Source', 'Topic', 'By', 'Main Filters', 'Years Available'), row.names = FALSE)
Due to the way the API is set-up, the variables listed within 'main filters' are often the fastest way to subset an API call.
In addition to year
, the other main filters for certain endpoints
accept the following values:
| Filter Argument | Grade |
|-------------------|-------|
| grade = 'grade-pk'
| Pre-K |
| grade = 'grade-k'
| Kindergarten |
| grade = 'grade-1'
| Grade 1 |
| grade = 'grade-2'
| Grade 2 |
| grade = 'grade-3'
| Grade 3 |
| grade = 'grade-4'
| Grade 4 |
| grade = 'grade-5'
| Grade 5 |
| grade = 'grade-6'
| Grade 6 |
| grade = 'grade-7'
| Grade 7 |
| grade = 'grade-8'
| Grade 8 |
| grade = 'grade-9'
| Grade 9 |
| grade = 'grade-10'
| Grade 10 |
| grade = 'grade-11'
| Grade 11 |
| grade = 'grade-12'
| Grade 12 |
| grade = 'grade-13'
| Grade 13 |
| grade = 'grade-14'
| Adult Education |
| grade = 'grade-15'
| Ungraded |
| grade = 'grade-16'
| K-12 |
| grade = 'grade-20'
| Grades 7 and 8 |
| grade = 'grade-21'
| Grade 9 and 10 |
| grade = 'grade-22'
| Grades 11 and 12 |
| grade = 'grade-99'
| Total |
| Filter Argument | Level of Study |
|-------------------|----------------|
| level_of_study = 'undergraduate'
| Undergraduate |
| level_of_study = 'graduate'
| Graduate |
| level_of_study = 'first-professional'
| First Professional |
| level_of_study = 'post-baccalaureate'
| Post-baccalaureate |
| level_of_study = '99'
| Total |
Let's build up some examples, from the following set of endpoints.
df <- df[df$section == 'schools' & df$topic == 'enrollment', ] knitr::kable(df[vars], col.names = c('Level', 'Source', 'Topic', 'By', 'Main Filters', 'Years Available'), row.names = FALSE)
The following will return a data.frame
across all years and grades:
library(educationdata) df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment')
Note that this endpoint is also callable by
certain variables:
These variables can be added to the by
argument:
df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment', by = list('race', 'sex'))
You may also filter the results of an API call. In this case year
and
grade
will provide the most time-efficient subsets, and can be vectorized:
df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment', by = list('race', 'sex'), filters = list(year = 1988:1990, grade = 6:8))
Additional variables can also be passed to filters
to subset further:
df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment', by = list('race', 'sex'), filters = list(year = 1988:1990, grade = 6:8, ncessch = '010000200277'))
Finally, the add_labels
flag will map variables to a factor
from their
labels in the API.
df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment', by = list('race', 'sex'), filters = list(year = 1988:1990, grade = 6:8, ncessch = '010000200277'), add_labels = TRUE)
Finally, the csv
flag can be set to download the full .csv
data frame. In
general, the csv
functionality is much faster when retrieving the full data
frame (or a large subset) and much slower when retrieving a small subset of a
data frame (especially ones with a lot of filters
added). In this example,
the full csv
for 2008 must be downloaded and then subset to the 96
observations.
df <- get_education_data(level = 'schools', source = 'ccd', topic = 'enrollment', by = list('race', 'sex'), filters = list(year = 1988:1990, grade = 6:8, ncessch = '010000200277'), add_labels = TRUE, csv = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.