README.md

Chicago Data Package

My name is Ryan Wang, and I am a Master’s student at the Univeristy of Chicago, concentrating on quantitative methods of social analyses. For this project, I further developed an open Chicago data package from a past GIS 3 class.

This is a continuation of Clyde Schwarb’s Chicago Data Package (https://github.com/cschwab1/chicagodatapackage) project for GIS 3. Clyde is an undergraduate at the University of Chicago studying geography and environmental and urban studies. A year ago, Clyde initated this project to “provide a starting point for those interested in learning geocomputation with R using real Chicago data”, and “building a larger collection of civic data for academic and amateur, journalist and researcher, non-profit and concerned citizens alike.” Some most commonly used datasets were provided in the original package, along with initial processing code and vignettes from Clyde. I completed documentation of datasets and added two new datasets, Crime - from 2001 to present, and Divvy Trips.

Cleaned datasets are available in /data, and original data is included the unprocessed data in /data-raw. The data was primarily collected from the Chicago data portal (https://data.cityofchicago.org/), but also contains datasets from the CDC and Cook County. Additional documentation is available in /R.

Installation

The project is not yet available from CRAN, so you can download it directly from github:

library(devtools)
install_github("ryanwyg/Chicago-Data-Package", build_vignettes = TRUE)

The installation might take several seconds to a minute depending on your internet connection and computer specifications.

To use the package, simple run:

library(ChicagoPackage)

Datasets Included

Datasets to be included in the package include (All accessed through SODA API):

The above datasets are all original sets included in Clyde’s Package, and I linked online API locations, and will update in the code. I also included an additional dataset below which I think is quite important:

More datasets will be added after the completion of the first stage of project.

Vignettes

Three examples using this package is included. They concern with the visualization and data wrangling that one would be able to do using the ChicagoPackage.

To see the vignettes in R, simply run the following code:

vignette(package = "ChicagoPackage")

The Chicago Data Explorer

This is a flexdashboard app with RShiny code incorporated. This app uses the data and functionalities in this package itself and it intends to be a tool for initial assessment of the Chicago data, aiming to quickly guide interests for further data discovery.

The app is currently published on the RShiny server here and updated frequently: https://ryanwyg.shinyapps.io/ChicagoDataExplorer/#section-the-maps Image of Data Explorer
App

The app is made into two sections, a spatial data explorer section (“The Maps”) and a non-spatial data explorer section(“The Tables”). Both use sidebars to select specific variables of interest, and has the functionality for downloading the data. If you are interested in further developing this app, or help trouble shooting, feel free to refer to the "ChicagoDataExplorer" RMD file.

Known Issues

Data Package

The "Crime" and "Divvy Trips" datasets are very large and not supported on Github; csv_to_sf function not working properly (read.csv would return "'file' must be a character string or connection" in the function, but works normally when used separately.)

Data Explorer App

Irrelevant selection is currently not greyed out in visualization tab (e.g. "Year" slider is always there yet only one dataset would utilize this functionality); Scrolling is buggy in the data table visualization (second tab); App is not optimized for mobile.

Future Improvements

This project is ongoing and future functionalities will come soon, mainly being expansions in the data. You can come back and explore updates through the Data Explorer, which is constantly updated with new datasets from the city’s data portal. I plan to expand further with more sections for different categories of data, just like how the data portal websites classifies the data (“Buildings”, “Community”, “Education” etc.)

Another update that I plan to do is to transform data access for all datasets online. Whilst the current state of the project has some data being accessed online, not all datasets are accessed this way, especially shape files. GitHub has a 25MB inidividual file size limit, which means that any dataset in the package would have to be smaller than 25MB. This is very limiting and tranforming to online access for big datasets would allow for more possibility in the package. However, this will create problems for the data explorer app, which is a potential concern. Please do check back for updates.



ryanwyg/Chicago-Data-Package documentation built on June 22, 2020, 2:55 p.m.