library(noaa)
knitr::opts_chunk$set(fig.dpi = 96)

Introduction

This document serves as the main overview of the noaa package, and will try to explain the hows and whys of the package content along with clear visual examples.

NOAA data

The U.S. National Oceanographic and Atmospheric Administration (NOAA) publish a database on significant earthquakes around the world. The dataset contains information about 5,933 earthquakes over an approximately 4,000 year time span.

The datasset is avialable at NOAA Link . An actual state of the dataset can be obtained at the "Download" link.

Package snapshot

In order to support easy access to the dataset the package contains a snapshot of the avialable data. To use the data it should be loaded:

data(earthquakes)

Use the latest data

It is useful to use the latest downloaded data from the NOAA site. eq_read_data() is the function to use:

eq <- eq_read_data(filename="../data-raw/signif.txt.bz2" )

Field description

The detailed description of the dataset is supported by NOAA at the "Event Variable Definitions" link.

The documentation also included in the package: man/earthquakes.Rd, use ?earthquakes to see it.

Data cleaning

The raw NOAA data is not ideal for R processing. There are two helper functions to clean the data.

Datetime field insertion

NOAA dataset contains separate fields for the date - time: 'YEAR','MONTH','DAY','HOUR','MINUTE','SECOND'. eq_clean_data() create a new R datetime field date and deletes the separate fields.

To clean the built in earthquakes data you should run :

data(earthquakes)
eq <- eq_clean_data(earthquakes)
head(eq$date)

Location clean

NOAA dataset has a "LOCATION" filed containing all caps text wth country information added. To display this field a simple transformation is done by eq_location_clean. The function strips out the "COUNTRY" part and formats the string as "Title Case".

data(earthquakes)
# NOAA version
head(earthquakes$LOCATION_NAME)

eq <- eq_location_clean(earthquakes)
# Cleaned version
head(eq$LOCATION_NAME)

Displaying a timeline

The package contains a ggplot2 geom to visualize the times at which earthquakes occur within certain countries.

The geom_timeline() main use to display a timeline of events:

The function geom_timeline_label() adds text directly to the timeline plot. The param n_max defines the number of the max size events to display.

library(magrittr)
library(ggplot2)
library(lubridate)
library(dplyr)
data(earthquakes)
eqdta <- eq_location_clean(eq_clean_data(earthquakes)) %>%
filter(date > lubridate::ymd("20000101") &
(COUNTRY == "USA" | COUNTRY == 'CHINA')) %>%
group_by(COUNTRY)

# Draw the geom
ggplot (data = eqdta,
aes(
x = date,
y = COUNTRY,
size = FOCAL_DEPTH,
label = LOCATION_NAME,
colour = EQ_PRIMARY
)) +
geom_timeline() +
geom_timeline_label(n_max = 6)

Displaying an interactive map

The eq_map() function takes an argument data containing the filtered data frame with earthquakes to visualize. The function maps the epicenters (LATITUDE/LONGITUDE) and annotates each point with in pop up window containing annotation data stored in a column of the data frame.

The eq_create_label() adds an informative pop up text to a dataframe.

The usage of both function is:

library(magrittr)
data(earthquakes)
 earthquakes %>%
   eq_clean_data() %>%
   dplyr::filter(COUNTRY == "MEXICO" & lubridate::year(date) >= 2000) %>%
   dplyr::mutate(popup_text = eq_create_label(.)) %>%
   eq_map(annot_col = "popup_text")


cogitoergoread/noaa documentation built on May 20, 2019, 1:28 p.m.