In sdbecker/shakerr: Analyse Earthquake Data

require(dplyr)
require(shakerr)
require(ggplot2)

Introduction

This package offers functionality to format earthquake data sets downloaded from the U.S. National Oceanographic and Atmospheric Administration (NOAA).

Once the data set is formatted the package offers two ways to visualize the data. The first way is by using a timeline graph where the magnitude and year of the earthquake is graphed. The second is by mapping the epicenters of a selection of earthquakes directly on a map.

The plots will not only show the location but the magnitude as well. The line graphs will have two sub-types, one that labels the top earthquakes in the timeline and one that does not. Likewise the epicenter graphs will have two subtypes, one that gives basic information about the earthquake and one with a little more detail of the magnitude and number of deaths.

This vignette will be split into two parts, the first will describe the cleaning and formatting of the data and the second the visualization of the data.

Cleaning and Formatting the Data

There are three types of sample data available to demonstrate the functionality in this package. An example of one is the eq_test_data data set. This data set contains the magnitude and other attributes of 100 earthquakes from around the world since recorded time of the NOAA database. The information for these data sets may be read by running a help call ?eq_test_data on the data set.

The following demonstrates how the data set is formatted using the package functionality.

eq_clean_data: creates a Date classed field with the appropriate date format, and converts the longitude and latitude fields to a numeric class. eq_location_clean: reformats the location name of the earthquake.

#Data before formatting
head(select(eq_test_data, HOUR, MINUTE, SECOND, LONGITUDE, LATITUDE,
                   LOCATION_NAME))
#Formatting the data
frm_data <- eq_clean_data(eq_test_data) %>% eq_location_clean()

#Formatted data
head(select(frm_data, DATE, LONGITUDE, LATITUDE, LOCATION_NAME))

Note the first data set shows the unformatted data and the second the formatted data.

Data Visualization

The formatted data can be visualized into two ways, the first as a timeline and the second as points of occurence on a map.

The following functions show how a timeline may be graphed, with and without labels. The function quaketimeline_plot utilises the GeomTimeline and StatTimeline Geom and Stat respectively. Labelling the timelines requires quaketimelinelabel_plot which uses the GeomTimelinelable and StatTimelinelabel Geom and Stat respectively. Note that with labels, the extra parameter indicates the top number of earthquakes to label by magnitude.

quaketimeline_plot(eq_clean_testdata[1:12,], "1800-01-01","2014-01-01")

Instead of using the function, the underlying geom_timeline function may be used. Note that a minimum and maximum date need to be entered.

ggplot(eq_china_cleandata, aes( x = DATE, y = as.factor(COUNTRY))) + 
  geom_timeline( x_min = "1800-01-01", x_max = "2015-01-01")

quaketimelinelabel_plot(eq_china_cleandata, "1800-01-01","2015-01-01", 3)

Again, going directly to the underlying function geom_timelinelabel is possible if a little more flexibility is required.

ggplot(eq_china_cleandata, aes( x = DATE, y = as.factor(COUNTRY))) +
  geom_timeline( x_min = "1800-01-01", x_max = "2015-01-01") +
  geom_timelinelabel(ggplot2::aes_(magnitude = quote(EQ_PRIMARY),
                                   label = quote(LOCATION_NAME),
                                   col = NULL),
                     x_min = "1800-01-01",
                     x_max = "2015-01-01",
                     n_max = 3)

The second kind of visualization plots the epicenters of the selected earthquakes on a map. The epicenters diameter will represent the magnitude of the earthquake, with the each data point showing further information when you click on it. The first graph will present the date of the earthquake when you click on it. The second will offer more detail, with location and magnitude.

eq_map(eq_china_cleandata, "DATE")

  eq_clean_data(eq_test_data) %>% 
  dplyr::filter(COUNTRY == "TURKEY" & lubridate::year(DATE) >= 1500) %>% 
  dplyr::mutate(popup_text = eq_create_label(.)) %>% 
  eq_map(annot_col = "popup_text")