knitr::opts_chunk$set(collapse = T, comment = "#>")
NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true")
knitr::opts_chunk$set(purl = NOT_CRAN)
library(rscorecard)

Using {rscorecard} to download data from the College Scorecard API requires two steps:

  1. Setting your API key
  2. Making a request

1. Setting your API key

If you don't already have one, reqest your (free) API key from https://api.data.gov/signup. It should only take a few moments to register and receive your key.

Once you've gotten your key, you can store it usig sc_key(). In the absence of a key value argument, sc_get() will search your R environment for DATAGOV_API_KEY. It will complete the data request if found. sc_key() command will store your key in DATAGOV_API_KEY, which will persist until the R session is closed.

# NB: You must use a real key, of course... 
sc_key("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

If you want a more permanent solution, you can add the following line (with your actual key, of course) to your .Renviron file. See this appendix for more information.

# NB: You must use a real key, of course... 
DATAGOV_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

2. Simple request

Each request requires the following four commands piped together using |>:

  1. sc_init()
  2. sc_filter()
  3. sc_select()
  4. sc_get()

The command chain must begin with sc_init() and end with sc_get. All other commands can come in any order.

The request belower should return a tibble with the name, IPEDS ID, state, and degree-seeking undergrad enrollment of all primarily Baccalaureate colleges in the Mid East region located in rural areas:

df <- sc_init() |> 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) |> 
    sc_select(unitid, instnm, stabbr, ugds) |> 
    sc_get()
df

Because we didn't include a specific year, the latest data are returned. We could have specifically asked for the latest data using sc_year("latest"):

df <- sc_init() |> 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) |> 
    sc_select(unitid, instnm, stabbr, ugds) |>
    sc_year("latest") |> 
    sc_get()
df

For a prior year's data, change the value in sc_year():

df <- sc_init() |> 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) |> 
    sc_select(unitid, instnm, stabbr, ugds) |>
    sc_year(2005) |> 
    sc_get()
df

Field of study data

In the fall of 2019, the College Scorecard released field of study-level data elements (4 digit CIP code level). These data elements can be requested alongside institution-level data:

df <- sc_init() |> 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) |> 
    sc_select(unitid, instnm, stabbr, ugds, cipcode, cipdesc, debt_mdn) |>
    sc_year("latest") |> 
    sc_get()
## filter to show only those with non-NA values for median debt
df |> dplyr::filter(!is.na(debt_mdn))

Important note:

The mapping scheme of data across years isn't consistent across data elements. From the technical documentation for institution-level data:

The data contain diverse measures of institutional performance constructed both with an eye towards the type of information that would be most useful to prospective students, as well as towards how the measures might promote accountability for institutions. The measures require different definitions of cohorts. Users of the data should be aware of this, particularly when constructing analyses of the relationship between different measures. Moreover, reporting inaccuracies in some data elements used for cohort definitions are also important. (p. 37)

That is, while the reporting year (e.g., sc_year(2016)) may be the same, the measurement year may not directly align. The same holds true when trying to align institution-level data with field of study-level data (see the technical documentation for field of study-level data for more information).

The upshot is that {rscorecard} will return data based on what the API call returns, but the user should take care to ensure that returned data elements align with expectations and project needs.

More information and examples

For more information about each command, see Commands.

For more examples, see More examples.



btskinner/rscorecard documentation built on March 27, 2024, 12:31 a.m.