In Cesar-Urteaga/rnoaa: Functions to get, clean, visualize, and analize the NOAA's quake database

knitr::opts_chunk$set(collapse = TRUE,
                      comment  = ">",
                      fig.path = "man/figures/README-",
                      fig.align = "center",
                      fig.height = 6,
                      fig.width = 9,
                      message = FALSE, warning = FALSE)

rnoaa

Overview
Installation
Usage
Documentation
Unit Testing
Development Workflow

Overview

rnoaa is an R package with a set of functions that makes easier to analyze the earthquake data provided by the U.S. National Oceanic and Atmospheric Administration (NOAA).

Installation

# Install the package from GitHub without the vignette:
devtools::install_github("Cesar-Urteaga/rnoaa")

# Or you can include it:
devtools::install_github("Cesar-Urteaga/rnoaa", build_vignettes = TRUE)

Usage

This package allows you to get and clean the latest earthquake data from the NOAA's Webpage so as to prepare it for analysis:

library(rnoaa)
library(dplyr)

# GETTING THE DATA
#   In case you do not have internet access, you can use the get_earthquake_data
#   function, which is a snapshot of the quake's data on September 10, 2017:
#   raw_data <- get_earthquake_data()
raw_data <- download_earthquake_data()

# TIDYING THE DATA UP
#   Before the data has been processed:
set.seed(48)
raw_data %>%
  select(YEAR, MONTH, DAY, COUNTRY, LOCATION_NAME) %>%
  sample_n(6)
#   We use the two rnoaa's functions to clean the data.
clean_data <- raw_data %>%
  eq_clean_data() %>%
  eq_location_clean()
#   After the data has been processed (note that the DATE variable has been
#   created and the country has been removed for the LOCATION_NAME variable):
set.seed(48)
clean_data %>%
  select(YEAR, MONTH, DAY, DATE, COUNTRY, LOCATION_NAME) %>%
  sample_n(6)
#   N.B.: When the month or/and day is/are missing, the date is approximated
#         at the midpoint of the period.

Once the data was tidied, rnoaa includes two ggplot2's geoms to visualize the timeline in which the quakes have ocurred and label the ones with the greatest magnitude:

library(ggplot2)
clean_data %>%
  filter(COUNTRY %in% c("CANADA", "USA", "MEXICO",
                        "CHINA", "JAPAN", "INDIA"),
         !is.na(EQ_PRIMARY),
         YEAR %in% 2000:2016) %>%
  ggplot(mapping = aes(x = DATE,
                       y = COUNTRY,
                       size = EQ_PRIMARY,
                       color = TOTAL_DEATHS / 1000,
                       label = LOCATION_NAME)
         ) +
  geom_timeline() +
  geom_timeline_label(# We want to show the label for at most the two highest
                      # earthquakes by size.
                      n_max = 2,
                      line_height = 1 / 4,
                      angle = 10,
                      fontsize = 2.5) +
  labs(size = "Richter scale value",
       color = "# deaths in thousands",
       y = "") +
  guides(size = FALSE) +
  theme_timeline()

Furthermore, it provides with functions to display the epicenters in an interactive R leaflet map:

clean_data %>%
  dplyr::filter(COUNTRY == "JAPAN" & lubridate::year(DATE) >= 2000) %>%
  eq_map(annot_col = "DATE")

Also, it assists you to display the quake's traits from the data using popup text labels:

clean_data %>%
  dplyr::filter(COUNTRY == "MEXICO" & lubridate::year(DATE) == 2017) %>%
  dplyr::mutate(popup_text = eq_create_label(.)) %>%
  eq_map(annot_col = "popup_text")

Documentation

Please check the package's vignette to see how the package works or review the examples given in the package's documentation (use ?function_name); you can run them with the function example (e.g., example("geom_timeline_label")).

Unit Testing

In order to increase the quality of the package, a test suite was carried out for each function using the testthat R package. I have added a code coverage measure to the package's repository using codecov.