knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(earthquake)
library(dplyr)
library(ggplot2)

Introduction

This package provides tools for processing and visualizing a dataset obtained from the U.S. National Oceanographic and Atmospheric Administration (NOAA) on significant earthquakes around the world. This dataset contains information about 5,933 earthquakes over an approximately 4,000-year time span. The dataset has a substantial amount of information embedded in it that may not be immediately accessible to people without knowledge of the intimate details of the dataset. Our goal is to enable others to gain some use out of the information embedded within.

Cleaning and Saving Earthquake Data

The following functions enable you to read the source file and clean it in preparation for visualization.

data("earthquakes")

Use eq_clean_data to perform the following series of edits to clean the data frame:

  1. Convert the SECOND variable to a numeric type and round to the nearest whole number,
  2. Replace missing values in the MONTH and DAY variables with '1',
  3. Replace missing values in the HOUR, MINUTE and SECOND variables with '0',
  4. Use the YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND variables to create a new DATE variable that contains the date of an event,
  5. Convert the LATITUDE and LONGITUDE variables to a numeric type,
  6. Rename the I_D variable as ID,
  7. Change the FLAG_TSUNAMI variable to a logical value,
  8. Change the EQ_PRIMARY variable from a character to a numeric type, and
  9. Filter the dataset to remove observations with missing values in the DATE, EQ_PRIMARY and TOTAL_DEATHS variables.
earthquakes <- eq_clean_data(earthquakes)

Use the eq_location_clean function to edit the location-related variables in the data frame:

  1. Format the LOCATION_NAME variable by stripping out the country from the name and converting the text from uppercase to title case,
  2. Format the COUNTRY variable in the same way as the LOCATION_NAME variable, and
  3. Remove any leading and trailing whitespace from both variables.
earthquakes <- eq_location_clean(earthquakes)

Use eq_select_data to subset the data frame to the following variables:

earthquakes <- eq_select_data(earthquakes)

Filtering Earthquake Data

Use the eq_count_events function to identify the countries with events in a specified date range. eq_count_events returns a data frame listing the country and count of events in descending order of count.

events <- eq_count_events(earthquakes, 
                          minimum_date = "2000-01-01", 
                          maximum_date = "2018-12-31")

knitr::kable(head(events, 10),
             caption = "Top 10 countries by number of events in descending order.",
             col.names = c("Country", "Number of Events"),
             align = "lr")

Use eq_filter_data to subset your earthquakes data frame to the events to be visualized. eq_filter_data accepts four arguments: a data frame containing the source data, a character vector containing country names and date values for the minimum and maximum dates. When performing the filter, the function will return all events with dates between and including the minimum and maximum dates.

quakes1 <- eq_filter_data(earthquakes, 
                          countries = c("Indonesia", "Japan", "Russia"), 
                          minimum_date = "2000-01-01",
                          maximum_date = "2018-12-31")

Generating Earthquake Timeline Plots

To create a plot of events for all countries on a single timeline, use the geom_timeline function in conjunction with ggplot. The x aesthetic holds the DATE variable. The xmin and xmax aesthetics hold the minimum and maximum dates for the timeline and the color and size aesthetics can be set using TOTAL_DEATHS and EQ_PRIMARY variables respectively to color and size the points based on the number of deaths and magnitude of each event.

ggplot(data = quakes1, aes(x = DATE, color = TOTAL_DEATHS, size = EQ_PRIMARY)) +
      geom_timeline(xmin = "2000-01-01", xmax = "2018-12-31") +
      labs(title = "NOAA Significant Earthquakes",
           subtitle = "Plot of events for Indonesia, Japan and Russia combined.",
           x = "Timeline",
           y = "",
           color = "Total Deaths",
           size = "Magnitude")

By passing the COUNTRY variable to the y aesthetic, you can plot separate timelines for each country within the plot.

ggplot(data = quakes1, aes(x = DATE, y = COUNTRY, color = TOTAL_DEATHS, size = EQ_PRIMARY)) +
      geom_timeline(xmin = "2000-01-01", xmax = "2018-12-31") +
      labs(title = "NOAA Significant Earthquakes",
           subtitle = "Plot of events for Indonesia, Japan and Russia on individual timelines.",
           x = "Timeline",
           y = "",
           color = "Total Deaths",
           size = "Magnitude")

Using geom_timeline_label, you can add labels for a selected number of ranked events by variable. When using geom_timeline_label, use the n_max aesthetic to identify the number of ranked events to label; then use the label and magnitude aesthetics to create the label and select the variable to use in the ranking. ggplot will add a vertical line and label for each event.

ggplot(data = quakes1, aes(x = DATE, color = TOTAL_DEATHS, size = EQ_PRIMARY)) +
      geom_timeline(xmin = "2000-01-01", xmax = "2018-12-31") +
      geom_timeline_label(n_max = 5, 
                          aes(label = LOCATION_NAME, magnitude = EQ_PRIMARY)) + 
      labs(title = "NOAA Significant Earthquakes",
           subtitle = "Plot of events for Indonesia, Japan and Russia combined.",
           x = "Timeline",
           y = "",
           color = "Total Deaths",
           size = "Magnitude")

Labeling works for separate timelines as well. Note that, in this case, it identifies the top n_max events across all events in the data frame (as opposed to finding the top n_max events for each country) and labels them.

ggplot(data = quakes1, aes(x = DATE, y = COUNTRY, color = TOTAL_DEATHS, size = EQ_PRIMARY)) +
    geom_timeline(xmin = "2000-01-01", xmax = "2018-12-31") +
    geom_timeline_label(n_max = 5, 
                        aes(label = LOCATION_NAME, magnitude = EQ_PRIMARY)) + 
    labs(title = "NOAA Significant Earthquakes",
           subtitle = "Plot of events for Indonesia, Japan and Russia on individual timelines.",
           x = "Timeline",
           y = "",
           color = "Total Deaths",
           size = "Magnitude")

Generating Interactive Leaflet Maps

Use eq_filter_data in conjunction with the eq_create_label and eq_map functions to generate an interactive map of historical earthquakes for a given geographical location. eq_map plots events as circles on a leaflet map with radii of the circles proportional to the magnitude of the earthquakes. eq_create_label generates optional labels for each plotted event so that, when you click the event on a map, a popup label displays information about the individual event.

eq_filter_data(earthquakes, 
               countries = c("Mexico"), 
               minimum_date = "1980-01-01",
               maximum_date = "2018-12-31") %>%
      mutate(POPUP_TEXT = eq_create_label(.)) %>%
      eq_map(annot_col = "POPUP_TEXT")


dtminnick/earthquake documentation built on Nov. 4, 2019, 11:04 a.m.