knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning= FALSE,
  message=FALSE
)

Introduction

library(NOAAsignifEarthQuakes)
# additionally for piping
library(magrittr)

This package is the result of a project centered around a dataset obtained from the U.S. National Oceanographic and Atmospheric Administration (NOAA). This dataset is focussed on significant earthquakes around the world and contains information about 5,933 earthquakes over an approximately 4,000 year time span.

The objective of this package is to:

  1. Efficently process the data
  2. Provide easy to use function for dedicated explorattory analysis

Data source

National Geophysical Data Center / World Data Service (NGDC/WDS): Significant Earthquake Database. National Geophysical Data Center, NOAA. (doi:10.7289/V5TD9V7K)

The data is incorporated into this package in file r system.file("extdata","signif.txt",package="NOAAsignifEarthQuakes")

Loading the package


desc_func <- dplyr::tbl_df(
  data.frame(
    Function = ls(getNamespace("NOAAsignifEarthQuakes"))
  )
) %>% 
  dplyr::mutate(
    Processing = Function %in% c("load_NOAA_db","eq_build_date","eq_build_location","eq_clean_data"),
    Timeline= Function %in% c("eq_legend_timeline","geom_timeline","geom_timeline_label","timeline_data"),
    Map=Function %in% c("eq_create_label","eq_map")
  ) %>% 
  dplyr::arrange(desc(Processing),desc(Timeline),desc(Map)) %>% 
  dplyr::mutate(
    Processing = ifelse(Processing,"&#10004;",""),
    Timeline= ifelse(Timeline,"&#10004;",""),
    Map=ifelse(Map,"&#10004;","")
  ) 
knitr::kable(
  desc_func,
  align=c("l","c","c","c"),
  col.names=c("Function","Data Processing","Timeline visualization","Map visualization")
)

Reading and cleaning the data

We use 2 main function to read and clean the data: 1. load_NOAA_db : read the raw NOAA file 1. eq_clean_data: process the resulting data frame using support finctions: * eq_build_date: to create the date feature from input year, month and day * eq_build_locaton: to proccess the location feature in a more human readable way

Reading the data: load_NOAA_db

file_noaa <- system.file("extdata","signif.txt",package="NOAAsignifEarthQuakes",mustWork=TRUE)
noaa_raw <- load_NOAA_db(file_noaa)
noaa_name <- colnames(noaa_raw)

We odtain a table with r ncol(noaa_raw) features and r nrow(noaa_raw) observations. We describe the schema of the raw data in detail in the appendix.

Processing the data

noaa_clean <- eq_clean_data(noaa_raw)
clean_cols <- names(noaa_clean)

Processing dates: eq_build_date

We simply take the YEAR, MONTH and DAY features in order to create the date feature. We demonstrate on the first 5 and last % raw of the original data building a valid date feature named clean_date

knitr::kable(
head(noaa_raw,5) %>% 
  dplyr::select("DAY","MONTH","YEAR") %>%
  dplyr::mutate(clean_date=eq_build_date(.))
)
knitr::kable(
tail(noaa_raw,5) %>% 
  dplyr::select("DAY","MONTH","YEAR") %>%
  dplyr::mutate(clean_date=eq_build_date(.))
)

Processing locations: eq_build_location

This function consists on: Removing the Country Extra location information within parethesis Correct white spaces Switch to Title case

knitr::kable(
head(noaa_raw,10) %>% 
  dplyr::select("LOCATION_NAME") %>%
  dplyr::mutate(clean_location=eq_build_location(.))
)

Cleaning the data; eq_clean_data

Mainly this funcions consist of processing existing the features with the helper function above and select features reducing the raw data to r ncol(noaa_clean) features but still r nrow(noaa_clean) observations:

  1. r clean_cols[1] : date of the earthqukes event
  2. r clean_cols[2] : country where the earthquake occured
  3. r clean_cols[3] : location of the earthquake
  4. r clean_cols[4] : Longitude coordinate of the Earthquake epicenter
  5. r clean_cols[5] : Latitude coordinate of the Earthquake epicenter
  6. r clean_cols[6] : Total number of fatalities caused by the earthquakes
  7. r clean_cols[7] : Equivelent Richter scale Magnitude

The last 1 row of the resulting table are:

knitr::kable(tail(noaa_clean,10))

Building a Timeline

Filtering the data with timeline_data

This function consist on preparing the data prioir to building the timeline. It enable to:

  1. Drops features unnecessary to the timeline (e.g. LATITUDE and LONGITUDE)
  2. filter within a date range specified with keyword dmin and dmax
  3. filter 1 or several countries with keyword countries
  4. Additionnaly it create a feature MAG_RANK giving the rank order of earthquake by country in descreasing order of Magnitude
filt_noaa <- noaa_clean %>%
  timeline_data(dmin='2010-01-01',dmax='2011-01-01',countries=c("USA","China"))
knitr::kable(filt_noaa)

Showing timelines geom_timeline

For 1 country without easthetics

g_usa <- geom_timeline(noaa_clean,
                       countries='USA',
                       xmin='2000-01-01',
                       xmax='2017-01-01')

For multiple countries with easthetics

This geom takes for optional aesthetics

g_usachina <- geom_timeline(noaa_clean,
              mapping = ggplot2::aes(
                y=COUNTRY,
                fill=DEATHS,
                size= MAG
              ),
              countries=c('USA',"China"),
              xmin='2000-01-01',
              xmax='2017-01-01')
g_usachina

Sidenote: Legend labeling with eq_legend_timeline

In order to present proper legend label we build function eq_legend_timeline.

# define fake aesthetics
feat <- ggplot2::aes(aes_1=DATE,aes_2=MAG,aes_3=DEATHS,aes_4=COUNTRY,aes_5=LOCATION_NAME)
# test eq_legend_timeline in a table
legend_aes <- dplyr::tbl_df(data.frame(
  aesthetic=c('aes_1','aes_2','aes_3','aes_4','aes_5')
)) %>% dplyr::rowwise() %>%
  dplyr::mutate(
    feature= rlang::quo_name(feat[[aesthetic]]),
    label= eq_legend_timeline(feat[[aesthetic]]))
knitr::kable(legend_aes)

Adding to timelines locations of most significant earthquakes geom_timeline_label

Adding labels to the previous timeline

The function take n_max ads keyword to set the maximum number of locations to display. By default this value is set to 5.

We go back to the first timeline and display the location of the 10 most significant earthquakes

g_usa + geom_timeline_label(n_max=10)

We go back to the secondt timeline and display the location of the 5 most significant earthquakes

g_usachina + geom_timeline_label()

Building a Map

Setting a use case

To create a map we have to filter the processed data for a country and specific date range so as to avoid overflowing the visualisation.

filt_noaa <- noaa_clean %>% dplyr::filter(COUNTRY == "MEXICO" & lubridate::year(DATE) >= 2000)

Map with Date popup eq_map

Each point correspond to an earthquake with the size of the circle represnting the magnitude.

eq_map(filt_noaa,annot_col = "DATE")

Map with annotated popup eq_map with eq_create_label

Annotation is performed with eq_create_label. It simply consists on combining the LOCATION_NAME, DEATHS and MAG feature into a singled html encoded character. As demonstrated in the first 10 rows of our test case.

annot_noaa <- filt_noaa %>% head(10) %>%
  dplyr::mutate(annot_text=eq_create_label(.)) %>%
  dplyr::select("DATE","LOCATION_NAME","DEATHS","MAG","annot_text")
knitr::kable(annot_noaa)

It is call by eq_map when option annot_col is set to "popup_test", thus giving the following result.

eq_map(filt_noaa,annot_col = "popup_text")

Summary

The NOAAsignifEarthQuakes performs in a quite straitforward way 3 things

\appendix

Appendix: Raw data codebook

We give a short description of the feature present in the raw data. We indicate the column type as we set it while reading the data.

  1. Earthquake Id and type
    • r noaa_name[1] : [Character] unique id for the earthquake
    • r noaa_name[2] : [Character] Categorical set to Tsu if the earthquake generated a Tsunami (set to NA otherwise)
  2. Date Time variable
    • r noaa_name[3] : [Integer] 4 digit year corresponding the event, range -2150 to 2018
    • r noaa_name[4] : [Integer] 2 digit month corresponding the event, could be NA
    • r noaa_name[5] : [Integer] 2 digit day of the month corresponding to the event
    • r noaa_name[6] : [Integer] 2 digit hour corresponding the event
    • r noaa_name[7] : [Integer] 2 digit minute corresponding to the event
    • r noaa_name[8] : [Numeric] 2 digit seconds corresponding to the event
  3. Earthquake properties
  4. Location
    • r noaa_name[18] : [Character] Country name where the event occured
    • r noaa_name[19] : [Character] State where the event occured,
    • r noaa_name[20] : [Character] Location name ex: City or geographical landmark
    • r noaa_name[21] : [Double] Latitude coordinates of the event
    • r noaa_name[22] : [Double] Longitude coordinates of the event
    • r noaa_name[23] : [Character] Event location, contains country name and locality
  5. Death and injuries
    • r noaa_name[24] : [Integer] Number of deaths
    • r noaa_name[25] : [Character] Factorial code 1 to 4
    • r noaa_name[26] : [Integer] Number of Missing persons
    • r noaa_name[27] : [Character] Factorial code 1 to 4
    • r noaa_name[28] : [Integer] Number of injured persons
    • r noaa_name[29] : [Character] Factorial code 1 to 4
  6. Property Damage
    • r noaa_name[30] : [Numeric] Damage amount in M$
    • r noaa_name[31] : [Character] Factorial code 1 to 4
    • r noaa_name[32] : [Integer] Number of Houses destroyed
    • r noaa_name[33] : [Character] Factorial code 1 to 4
    • r noaa_name[34] : [Integer] Number of Houses damaged
    • r noaa_name[35] : [Character] Factorial code 1 to 4
  7. Total Death and injuries including related events (Tsunami, Volcano etc..)
    • r noaa_name[36] : [Integer] Number of deaths
    • r noaa_name[37] : [Character] Factorial code 1 to 4
    • r noaa_name[38] : [Integer] Number of Missing persons
    • r noaa_name[39] : [Character] Factorial code 1 to 4
    • r noaa_name[40] : [Integer] Number of injured
    • r noaa_name[41] : [Character] Factorial code 1 to 4
  8. Total Property Damage including related events (Tsunami, Volcano etc..)
    • r noaa_name[42] : [Numeric] Damage amount in M$
    • r noaa_name[43] : [Character] Factorial code 1 to 4
    • r noaa_name[44] : [Integer] Number of Houses destroyed
    • r noaa_name[45] : [Character] Factorial code 1 to 4
    • r noaa_name[46] : [Integer] Number of Houses damaged
    • r noaa_name[47] : [Character] Factorial code 1 to 4


BreizhZut/NOAAsignifEarthQuakes documentation built on Nov. 10, 2019, 3:45 p.m.