knitr::opts_chunk$set(collapse = T) library(nyschooldata)
Overall, data.nysed.gov does a lot of things right, and I very much appreciate the general spirit of data accessibility that is happening there.
There are some issues that come up when using the released assessment tables. This vignette documents what I've bumped into, how I've problem solved, and what might be done.
Here's the top of the Assessment database:
ex <- get_raw_assess_db(2016)
head(ex) %>% print.AsIs()
And here are ten random rows:
rand_slices <- runif(10, min = 1, max = ex %>% nrow()) %>% round(0) dirty_random <- ex %>% dplyr::slice(rand_slices) dirty_random %>% print.AsIs() summary(dirty_random)
A couple of things I notice:
districts are mixed with schools, with no apparent field that tags the level (School, District, County or Statewide Aggregation)
L1.PCT
, L2.PCT
etc all have percent symbols, which means that they won't read cleanly into R.
TOTAL.TESTED
, L1.COUNT
, L2.COUNT
, are character, not numeric (because of supression codes.)
test grade level isn't a field unto itself - it needs to be text-processed out of the ITEM.DESC
field.
there isn't a unique key/identifier per school / subject / subgroup
there are all kinds of extra classes thrown onto this data.frame - an unfortunate side effect of Hmiscs
mdb.get
function, but more fundamentally a reflection of the fact that Access databases simply aren't a great way to distribute data.
for the county and NRC aggregates (eg bedscode == 130000000000
), total_tested
is reported, but mean_scale_score
is not.
Odds and Ends:
the 2016 Assessment database has inconsistent headers. Instead of using school_year
, it uses sy_end_date
.
the 2015 Assessment database has inconsistent variables. It includes a SUM_OF_SCALE_SCORE
field.
clean_assess_db
solves these problems. Here's the output of that function:
(fetch_assess_db
is a wrapper around that function):
ex2 <- fetch_assess_db(2016)
And here are ten random rows:
clean_random <- ex2 %>% dplyr::slice(rand_slices) clean_random %>% print.AsIs()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.