knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(cache=TRUE)
I recently attended a hackathon hosted by Cloudera, a Hadoop/Big-Data firm in Austion. The theme was Zika virus data. We started with some data from the CDC on infections. In this document, I illustrate how to map the data in R.
This example was developed for my UT Summer Statistics Institute course entitled Geospatial Data Analysis in R.
For other projects and software, visit http://www.keittlab.org/.
First we'll download the data using Hadley's excellent readr package.
pacman::p_load("readr") zika_states = readr::read_csv("https://raw.githubusercontent.com/cloudera-cares-austin/zika-hackathon/master/data/CDC_May_2016_US.csv", col_types = cols()) head(zika_states)
How lets get the state boundaries from the 2010 US Census Tiger database.
pacman::p_load("tigris") state_boundaries = tigris::states() tigris::plot(state_boundaries, col = "steelblue")
Not very pretty, so lets transform to US National Atlas map projection and remove unneeded areas.
pacman::p_load("sp") state_boundaries = subset(state_boundaries, ! NAME %in% c("Alaska", "Commonwealth of the Northern Mariana Islands", "United States Virgin Islands", "American Samoa", "Guam", "Hawaii")) sb_proj = sp::spTransform(state_boundaries, sp::CRS("+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs")) tigris::plot(sb_proj, col = "steelblue")
Now we join the zika data to the state boundaries file. This just adds the columns from the zika data frame to the corresponding rows of the state boundaries spatial data frame.
sb_zika = tigris::geo_join(sb_proj, zika_states, "NAME", "states") sb_zika$id = 1:nrow(sb_zika)
Let's see if we can plot it with ggplot. (Some pointer here.) In this step, we need to convert the spatial data into a data frame with x- and y- columns suitable for ggplot. I use the plyr join function. This will take a long time to plot because the state boundaries are very finely resolved.
pacman::p_load("rgeos") pacman::p_load("ggplot2") pacman::p_load("plyr") pacman::p_load("maptools") zika_gg = plyr::join(fortify(sb_zika, region = "id"), sb_zika@data, by = "id") names(zika_gg)[1:2] = c("easting", "northing") ggplot(zika_gg, aes(easting, northing, group = group)) + geom_polygon(aes(fill = travel_cases)) + coord_equal() + ggtitle("CDC Zika Cases May 2016")
Let's save the data frame for reuse.
if (!file.exists("zika_states.gml")) sf::st_write(sb_zika, "zika_states.gml")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.