knitr::opts_chunk$set(collapse = T, comment = "#>") NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true") knitr::opts_chunk$set(purl = NOT_CRAN)
There are two basic types of commands:
The third command, sc_dict()
, may be used to explore the College
Scorecard data dictionary.
sc_init()
Use sc_init()
to start the command chain. The only real option is
whether you want to use standard variable names (as they are found in
IPEDS) or the new developer-friendly variable names developed for the
Scorecard API. Unless you have good reason for doing so, I recommend
using the default standard names. If you want to use the
developer-friendly names, set dfvars = TRUE
. Whichever you choose,
you're stuck with that option for the length of piped command chain;
no switching from one type to another.
sc_get()
Use sc_get()
as the last command in the chain. If you haven't used
sc_key
to store your data.gov API key in the system environment,
then you must supply your key as an argument.
The following commands are structured to behave like
dplyr
. They can be
placed in any order in the piped command chain and each one relies
(for the most part) on non-standard
evaluation
for its arguments. This means that you don't have to quote variable
names.
sc_select()
Use sc_select()
to select the variables (columns) you want in your
final dataframe. These variables do not have to be the same as those
used to filter the data and are case insensitive. Separate the
variable names with commas. The Scorecard API requires that most of
the variables be prepended with their category. sc_select()
uses a
hash table to do this automatically for you so you do not have to know
or include those (and in fact should not). This command is the only
one of the subsetting commands that is required to pull data.
sc_filter()
Use sc_filter()
to filter the rows you want in your final
dataframe. Its main job is to convert idiomatic R code into the format
required by the Scorecard API. Like sc_select()
, sc_filter
prepends variable categories automatically and variables are case
insensitive. Like with dplyr::filter()
, separate each filtering
expression with a comma.There are a few points to note owing to the
idiosyncracies of the Scorecard API. First, there are the conversions
between R and the Scorecard, shown in the table below.
|Scorecard|R|R example|Conversion|
|:--------|:-----------|:------|:----|
|,
|c()
|sc_filter(stabbr == c("KY","TN"))
|school.state=KY,TN
|
|__not
|!=
|sc_filter(stabbr != "KY")
|school.state__not=KY
|
|__range
,..
|#:#
|sc_filter(ccbasic==10:14)
|school.carnegie_basic__range=1..14
|
|spaces (%20
)|" "|sc_filter(instnm == "New York")
|school.name=New%20York
|
A few notes:
c(1,2,5:10)
), it does not appear that Scorecard
API can. You will either have to overselect and then filter the
downloaded dataframe or list every value discretely.>
or <
symbols. This means that if you want to select a range of values
above a certain threshold (e.g., enrollments above 10,000
students), you may have to give a range of from 10001 to an
artifically large number. Same thing but reversed for values under
a certain threshold.1:10
will convert to
__range=1..10
and include both 1 and 10.sc_year()
All Scorecard variables except those in the root and school categories
take a year option. Simply set the data year you want. For the latest
data, you can either use sc_year("latest")
or leave out sc_year()
entirely, which will default to the latest data.
Two important points:
sc_zip()
Use sc_zip()
to subset the sample to those institutions within a
certain distance around a given zip code. Only one zip code may be
given. The default is distance is 25 miles, but both the distance and
metric (miles or kilometers) can be changed.
Users can search data elements in the College Scorecard data
dictionary using sc_dict()
.
library(rscorecard)
## simple search for "state" in any part of the dictionary sc_dict("stabbr")
You can also search using regular expressions and limit the search to only one dictionary column. For example, the search below only looks for varnames starting with "st":
## variable names starting with "st" sc_dict("^st", search_col = "varname")
You can also return the data dictionary as a tibble. When storing the
dictionary in an object, it may be useful to set print_off = TRUE
so
that the dictionary results don"t print to the console:
dict_df <- sc_dict("stabbr", print_off = TRUE, return_df = TRUE) dict_df
If you want the full data dictionary, simply search for "."
:
dict_df <- sc_dict(".", print_off = TRUE, return_df = TRUE) dict_df
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.