knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(data.table)
library(MsdrCapstoneMPS)
library(ggplot2)
library(grid)
library(ggthemes)
library(dplyr)

Intro

This package is for the capstone course in the mastering software development in R specialization. The package provides tools to clean and visualize data from the NOAA significant earthquake database.

Cleaning NOAA data

eq_clean_data

This first function builds a date field and ensures that the longitude and latitude of the earthquake location are numeric. The function takes in the NOAA data, which is obtained from here and returns a data frame with a valid date field and numeric longitude and latitude.

raw_noaa <- as.data.table(noaa_data)
tail(raw_noaa)

clean_noaa <- eq_clean_data(raw_noaa)
tail(clean_noaa)

eq_location_clean

A function that removes country information from the location field and ensures that the location is in title case. The function takes in the NOAA data, which is obtained from here and returns a data frame with a cleaned location name.

tail(clean_noaa[, LOCATION_NAME])

clean_noaa <- eq_location_clean(clean_noaa)
tail(clean_noaa[, LOCATION_NAME])

Earthquake timeline plots

geom_timeline

This geom plots a timeline with circles showing the dates when the event occurred. It is intended for the purposes of graphically exploring the NOAA Significant Earthquake Database (included in this package), but can show any data with a column of valid date objects.

First, we'll subset the earthquake data five years (January 1, 2005 - December 31, 2010) and the top five countries to demonstrate the plotting functions

## Set key to date
setkey(clean_noaa, DATE)

## Pull test set of 5 years
eq_subset <- clean_noaa[DATE >= "2005-01-01" & DATE <= "2010-12-31"]

## Subset to 5 countries
top_countries <- eq_subset[,.N,by = COUNTRY][order(-N)][1:5, COUNTRY]
eq_subset <- eq_subset[COUNTRY %in% top_countries]

## Set country to factor
eq_subset[, COUNTRY := as.factor(COUNTRY)]

## Set earthquake magnitude to numeric
eq_subset[, EQ_PRIMARY := as.numeric(EQ_PRIMARY)]

Plot all countries on single time line:

g <- ggplot(data = eq_subset, aes(x = DATE))
g + geom_timeline(xmin = "2005-01-01", xmax = "2010-12-31",
                  aes(color = DEATHS,size = EQ_PRIMARY))

Plot each country on it's own time line:

g <- ggplot(data = eq_subset, aes(x = DATE, y = COUNTRY))
g + geom_timeline(xmin = "2005-01-01", xmax = "2010-12-31",
                  aes(color = DEATHS,size = EQ_PRIMARY)) + labs(y = "")

geom_timeline_labels

This geom labels the timeline created from geom_timeline. It draws vertical lines up from the points on the timeline, with labels for the particular event. You have the option of only labeleing the top n_max number of events sorted by a specified variable.

Plot each country on it's own time line and label the top 20 earthquakes by magnitude, which is contained in the variable EQ_PRIMARY:

g <- ggplot(data = eq_subset, aes(x = DATE, y = COUNTRY))
g + geom_timeline(xmin = "2005-01-01", xmax = "2010-12-31",
                  aes(color = DEATHS, size = EQ_PRIMARY)) +
  geom_timeline_labels(n_max = 20, aes(label = LOCATION_NAME, magnitude = EQ_PRIMARY)) +
  labs(y = "")

theme_earthquake

This function creates the theme to plot the earthquake timelines. The legends are below the y axis, there is a solid black line along y = 0, and the background is white.

Plot each country on it's own time line and label the top 20 earthquakes by EQ_PRIMARY and place the plot on the earthquake theme background:

g <- ggplot(data = eq_subset, aes(x = DATE, y = COUNTRY))
g + geom_timeline(xmin = "2005-01-01", xmax = "2010-12-31",
                  aes(color = DEATHS, size = EQ_PRIMARY)) + 
  geom_timeline_labels(n_max = 20, aes(label = LOCATION_NAME, magnitude = EQ_PRIMARY)) +
  theme_earthquake() + labs(y = "")

Interactive leaflet maps

eq_map

This function create an interactive map of historical earthquakes using the NOAA earthquake database included with this package. Earthquakes are plotted as circles with their radii proportional to the magnatude of the earthquake. Optionally, labels can be passed which will pop up when the earthquake is clicked on.

Plot earthquakes in Mexico with date labels in popup windows:

raw_noaa %>%
  eq_clean_data() %>%
  dplyr::filter(COUNTRY == "MEXICO" &
                  DATE >= as.Date("2000-01-01")) %>%
  eq_map(annot_col = "DATE")

eq_create_label

This function creates a nicely formatted HTML label for use with eq_map showing Location, Magnatude, and Total Deaths. Missing values are skipped in the created label.

Plot earthquakes in Mexico with nicely formated popup windows that include location, magnitude, and total deaths for each earthquake:

raw_noaa %>%
  eq_clean_data() %>%
  eq_location_clean() %>%
  dplyr::filter(COUNTRY == "MEXICO" &
                  DATE >= as.Date("2000-01-01")) %>%  dplyr::mutate(popup_text = eq_create_label(.)) %>%
  eq_map(annot_col = "popup_text")


marksendak/MsdrCapstoneMPS documentation built on May 23, 2019, 7:33 a.m.