knitr::opts_chunk$set(# Collapse output blocks
                      collapse = TRUE,
                      comment = "#>",
                      fig.width = 7,
                      fig.height = 7,
                      fig.align = "center",
                      warning = FALSE)

This vignette describes what is the rnoaa package and how it should be used.

Description

The rnoaa package bundles a set of functions to get, clean, visualize, and analyze the data from the NOAA's Significant Earthquake dataset.

Usage

Tidying the data

Obtaining the data

The function get_earthquake_data allows you to load a snapshot of the earthquakes' data downloaded on September 10, 2017; on the proviso that you may want to download the latest data from the NOAA's Website, you can use the download_earthquake_data function (requires internet access).

# Extracts the data from a compressed file.
library(rnoaa)
raw_data <- get_earthquake_data()

Data processing

Once the data was downloaded, you may wish to prepare some key variables for data analysis; in particular, you could be interested in the quakes' date, location of the epicenter, magnitude, and the human cost of lives. Moreover, the name of the place in which the quake has taken place could be of your concern. So, eq_clean_data and eq_create_label functions allows you to mop these variables up for analysis.

The eq_clean_data function creates the quakes' dates based on the three time variables within the NOAA's dataset: YEAR, MONTH, and DAY. Specifically, it can create the dates for earthquakes that had occurred Before the Common Era (B.C.E.); this enhances the R's functionality because it does not handle these dates. In addition, when the month or/and day is/are missing, the date is approximated at the midpoint of the period.

On the other hand, the eq_location_clean function removes the country from the variable that has the quake's location (i.e., LOCATION_NAME variable) due to it does not make sense to keep it since a variable with the country (i.e., COUNTRY variable) already exists.

library(dplyr)

# Before the data has been processed:
set.seed(11)
raw_data %>% 
  select(YEAR, MONTH, DAY, COUNTRY, LOCATION_NAME) %>% 
  sample_n(6)

# Tidies the data up for analysis.
clean_data <- eq_clean_data(raw_data)
clean_data <- eq_location_clean(clean_data)

# After the data has been processed (note that the DATE variable has been 
# created and the country has been removed for the LOCATION_NAME variable):
set.seed(11)
clean_data %>% 
  select(YEAR, MONTH, DAY, DATE, COUNTRY, LOCATION_NAME) %>% 
  sample_n(6)
# N.B.: When the month or/and day is/are missing, the date is approximated at 
#       the midpoint of the period.

Analyzing the data

Visualizing the data with ggplot2

One of the features of the rnoaa package is that has two ggplot2's geoms which assist you to visualize the timeline in which the earthquakes have ocurred: geom_timeline and geom_timeline_label.

The geom_timeline displays each observation in a timeline to visualize the dates in which the quakes have played out. Due to this geom inherits its attributes from the class of geom_point, you can use the aesthetics of size and color to display some other earthquake's traits such as magnitude and impact of human casualties.

In addition, the package provides a new ggplot2's theme (i.e., theme_timeline) so as to make clearer these geoms.

library(ggplot2)

# Timeline of earthquakes occurred as of 2000 in Chile.
clean_data %>%
  filter(COUNTRY == "CHILE",
         !is.na(EQ_PRIMARY),
         YEAR %in% 2000:2016) %>%
  ggplot(mapping = aes(x = DATE,
                       size = EQ_PRIMARY,
                       color = TOTAL_DEATHS / 100)
         ) +
  geom_timeline() +
  labs(size = "Richter scale value",
       color = "# deaths in hundreds",
       y = "") +
  theme_timeline()

Note that the y aesthetic is not required, unless you may want to display two or more timelines, in which case it permits you to visualize the quakes for different countries.

# Quakes in 2017
clean_data %>%
  filter(!is.na(EQ_PRIMARY),
         YEAR == 2017) %>%
  ggplot(mapping = aes(x = DATE,
                       y = COUNTRY,
                       size = EQ_PRIMARY,
                       color = TOTAL_DEATHS,
                       label = LOCATION_NAME)
         ) +
  geom_timeline() +
  labs(size = "Richter scale value",
       color = "# deaths",
       y = "") +
  theme_timeline()

If you need to identify the most powerful earthquakes, you can use the geom_timeline_label along with the n_max parameter. It labels the quakes with greater intensity in case you have provided the size aesthetic.

# Quakes in Mexico from 1998 to 2005
clean_data %>%
  filter(COUNTRY == "MEXICO",
         !is.na(EQ_PRIMARY),
         YEAR %in% 1998:2005) %>%
  ggplot(mapping = aes(x = DATE,
                       size = EQ_PRIMARY,
                       color = TOTAL_DEATHS,
                       label = LOCATION_NAME)
         ) +
  geom_timeline() +
  # We label the five biggest quakes in magnitude.
  geom_timeline_label(n_max = 5,
                      line_height = 1 / 5, 
                      fontsize = 2.7) +
  labs(size  = "Richter scale value",
       color = "# deaths",
       y = "") +
  theme_timeline()

N.B.: The size aesthetic is not required. Had it been omitted, the geom would label the last observations:

# Quakes in Mexico from 1998 to 2005
clean_data %>%
  filter(COUNTRY == "MEXICO",
         !is.na(EQ_PRIMARY),
         YEAR %in% 1998:2005) %>%
  ggplot(mapping = aes(x = DATE,
                       color = TOTAL_DEATHS,
                       label = LOCATION_NAME)
         ) +
  geom_timeline() +
  # Were n_max omitted, it would display the three last observations.
  # (i.e., the default).
  geom_timeline_label(n_max = 5,
                      line_height = 1 / 5, 
                      fontsize = 2.7) +
  labs(color = "# deaths",
       y = "") +
  theme_timeline()

The parameter line_height controls the way how the vertical lines of the text labels are displayed, and since it is proportional to the available space, it avoids the overlapping with other levels (e.g., countries). In addition, you can use the following parameters to change the appearance of the text labels: angle, line_height, and fontsize.

# Quakes observed from 1982 to 1983.
clean_data %>%
  filter(!is.na(EQ_PRIMARY),
         YEAR %in% 1982:1983) %>%
  ggplot(mapping = aes(x = DATE,
                       y = COUNTRY,
                       size = EQ_PRIMARY,
                       color = TOTAL_DEATHS,
                       label = LOCATION_NAME)
         ) +
  geom_timeline() +
  # We label the most powerful quakes for each country.
  geom_timeline_label(n_max = 1,
                      angle = 0,
                      line_height = 1 / 5,
                      fontsize = 2.7) +
  labs(size = "Richter scale value",
       color = "# deaths",
       y = "") +
  theme_timeline()

As a result of using the package to create the quakes' dates, we can visualize the earthquakes that had ocurred B.C.E.

# Quakes observed B.C.E.
clean_data %>%
  filter(!is.na(EQ_PRIMARY),
         YEAR < 0) %>%
  ggplot(mapping = aes(x = DATE,
                       y = COUNTRY,
                       size = EQ_PRIMARY,
                       color = TOTAL_DEATHS / 1000,
                       label = LOCATION_NAME)
         ) +
  geom_timeline() +
  # We label the most powerful quakes for each country.
  geom_timeline_label(n_max = 1,
                      angle = 0,
                      line_height = 1 / 5,
                      fontsize = 2.7) +
  labs(size = "Richter scale value",
       color = "# deaths in thousands",
       y = "") +
  theme_timeline()

Visualizing the data with R's leaflet maps

rnoaa not only uses ggplot2's geoms but also R's leaflet maps. The latter provide a greater user-interaction since they have a manifold of features (e.g., popup text labels, mini-maps, option buttons, etc.).

The package has the eq_map function that displays an R leaflet map with the quake's epicenters and information of a given variable.

# Displays an R's leaflet map with the epicenters of earthquakes thad have 
# occurred in Indonesia as of 2000.  The interactive map has popup text labels 
# with the date of occurrence.
clean_data %>%
  dplyr::filter(COUNTRY == "INDONESIA" & lubridate::year(DATE) >= 2000) %>%
  eq_map(annot_col = "DATE")

Also, the package come along with the eq_create_label function that can be used to provide additional information within the popup text labels displayed on the R leaflet map.

# Displays an R's leaflet map with the epicenters of the earthquakes that have 
# occurred in Indonesia as of 2000.  The interactive map has popup text labels 
# with the location, magnitude, and total deaths.
clean_data %>%
  dplyr::filter(COUNTRY == "INDONESIA" & lubridate::year(DATE) >= 2000) %>%
  dplyr::mutate(popup_text = eq_create_label(.)) %>%
  eq_map(annot_col = "popup_text")


Cesar-Urteaga/rnoaa documentation built on May 10, 2019, 5:16 a.m.