knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7, 
  fig.height = 5,
  fig.align = 'center'
)

This document demonstrates the use of functions included in msdr5 package, which is the main deliverable of the Capstone Course of Coursera's Mastering Software Development in R Specialization.

The package can be installed directly from Github with devtools::install_github('avidclam\msdr5').

Data

Tha package is aimed at using National Oceanic and Atmospheric Administration's (NOAA) Significant Earthquake Database. Dataset can be downloaded from this direct link.

A copy of dataset as of August 15, 2018 is included in the package under the name earthquakes. It can be used after package installation with the following commands:

library(msdr5)
data("earthquakes")
devtools::load_all()
data("earthquakes")

Dataset earthquakes included in the package will be subsequently used in all examples below.

Cleaning Data

Function eq_clean_data() is used to gather earthquake date information spread among year, month, and day columns. Also it forces latitude and longitude variables to numeric type.

library(dplyr)
eq <- earthquakes %>% 
  eq_clean_data() %>% 
  arrange(desc(DATE)) %>% 
  select(LOCATION_NAME, DATE, LATITUDE, LONGITUDE)
head(eq, 5)

Function eq_location_clean() cleans the LOCATION_NAME column by stripping out the country name (including the colon) and converts names to title case (as opposed to all caps).

head(eq, 5) %>%
  eq_location_clean()

Visualizing

First we build a plot of earthquake timeline for one country using geom_timeline() and custom theme_eq_custom().

library(ggplot2)
library(grid)
earthquakes %>%
  eq_clean_data() %>%
  eq_location_clean() %>%
  filter(DATE >= as.Date("2000-01-01")) %>%
  filter(DATE < as.Date("2018-01-01")) %>%
  filter(COUNTRY %in% "USA") %>%
  select(COUNTRY, LOCATION_NAME, DATE, EQ_PRIMARY, TOTAL_DEATHS) %>%
  ggplot() +
  geom_timeline(aes(x = DATE, color = TOTAL_DEATHS, size = EQ_PRIMARY)) +
  scale_size_continuous(name = 'Richter scale value', guide = guide_legend(order = 1)) +
  scale_color_continuous(name = '# of Deaths', guide = guide_colorbar(order = 2)) +
  theme_eq_custom()

Next we add labels using geom_timeline_label()

library(ggplot2)
earthquakes %>%
  eq_clean_data() %>%
  eq_location_clean() %>%
  filter(DATE >= as.Date("2000-01-01")) %>%
  filter(DATE < as.Date("2018-01-01")) %>%
  filter(COUNTRY %in% c("USA", "CHINA")) %>%
  select(COUNTRY, LOCATION_NAME, DATE, EQ_PRIMARY, TOTAL_DEATHS) %>%
  ggplot(aes(x = DATE, y = COUNTRY, color = TOTAL_DEATHS)) +
  geom_timeline_label(aes(mag = EQ_PRIMARY, label = LOCATION_NAME), n_max = 5) +
  geom_timeline() +
  scale_size_continuous(name = 'Richter scale value', guide = guide_legend(order = 1)) +
  scale_color_continuous(name = '# of Deaths', guide = guide_colorbar(order = 2)) +
  theme_eq_custom()

Mapping

With eq-map() function we're ready to put earthquakes on an interactive map with DATE column as an annotation text.

library(leaflet)
library(lubridate)
eq_clean_data(earthquakes) %>%
  eq_location_clean() %>%
  dplyr::filter(COUNTRY == "MEXICO" & lubridate::year(DATE) >= 2000) %>%
  eq_map(annot_col = "DATE")

Finally, more useful pop-ups are introduced with the use of eq_create_label() function.

eq_clean_data(earthquakes) %>%
  eq_location_clean() %>%
  dplyr::filter(COUNTRY == "MEXICO" & lubridate::year(DATE) >= 2000) %>%
  eq_create_label(label_col = "popup_text") %>%
  eq_map(annot_col = "popup_text")


avidclam/msdr5 documentation built on May 29, 2019, 11:02 p.m.