knitr::opts_chunk$set( warning = FALSE, message = FALSE, collapse = TRUE, echo = TRUE, cache = FALSE, comment = "#>", fig.path = "README-" )
install.packages('devtools') # install devtools if needed devtools::install_github("dantonnoriega/sppedr") # install the package
This package assumes you have access to a directory with the following files and structure
spped └── Data ├── cps_2008_2016.dta ├── cps_address_2008_2016.dta RawData ├── CPS_Personnel ├── Chicago_Crime ├── Crosswalks ├── ISBE_Directories └── ISBE_School_Data
The directory spped
is considered the data "main directory" (or main_dir
) by the package. Wherever this is located on your computer, ensure the package knows where it is by running the set_main_dir()
command.
library(sppedr) clear_main_dir() # clear any existing main_dir try(get_main_dir()) # no main dir set_main_dir("/Users/danton/Dropbox/ra-work/spped") # set_main_dir() # can also run without path and it will prompt get_main_dir()
This will allow you to run the package functions from scratch should you want to generate new data.
The package, when loaded, also attaches the following datasets. These data sets can be called directly once you have loaded library(sppedr)
.
library(sppedr) data(package = 'sppedr') # list of all available data
cps_2008_2016
: school data for years 2008 to 2016. this is a csv version of the dta file cps_2008_2016.dta
cps_school_year
: this contains dates about the academic school year. it is a list object with two data frames, cps_school_year$school_year
and cps_school_year$days_off
.cps_address
: contains lat/lon coordinates and full addresses as understood by the google maps API.cps_security_personnel
cps_crosswalk
: this file helps link the cps_security_personnel
data with the all other data sources by merging on the unit_number
variable.cpd_crime_codes
: contains crime codes from the Chicago PD (IUCR codes) and the FBI. it is a list of two data frames, cpd_crime_codes$iucr
and cpd_crime_codes$fbi
.cps_address
dataset# first 10 full school address for academic year 2009 - 2010 cps_address %>% dplyr::filter(school_year == 2010) %>% dplyr::select(schoolname, school_year, address_full, lat, lon) %>% head(10)
In the package directory data-raw
, there is a README.txt
file that explains where all the internal datasets come from and/or how they are built. Scripts that build most of these data are also in data-raw
. Note that this directory is accessible via the github repo, not the package itself.
The only function that one really needs call is get_cpd_crime()
. This is because these data are much larger and could not be included as part of the package. It will import these data from the main_dir
set using the set_main_dir()
function.
The following functions do not need to be called unless you want to regenerate these data again from scratch.
get_cps_2008_2016()
get_cps_address()
get_cps_security_personnel()
To do so, you need to set force = TRUE
. Otherwise, the function will just load the package version. I would recommend NOT to use the force option.
Here is an example where the CPS security personnel data is spread and the merged to the crosswalk data.
personnel_wide <- cps_security_personnel %>% dplyr::left_join(., cps_crosswalk, by = 'unit_number') %>% dplyr::mutate(job_description = gsub('school ', '', job_description)) %>% dplyr::mutate(job_description = gsub('[[:space:]]', '_', job_description)) %>% dplyr::filter(!is.na(schoolidr_c_d_t_s)) %>% tidyr::spread(job_description, n) %>% dplyr::mutate_if(is.integer, dplyr::funs(replace(., is.na(.), 0))) %>% # replace NAs with 0 dplyr::select(-unit_number, -abbreviated_name) %>% dplyr::mutate(school_year = as.integer(school_year)) # list first 10 security_officer counts personnel_wide %>% dplyr::select(school_year, school_name, security_officer) %>% head(10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.