knitr::opts_chunk$set(# Collapse output blocks collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 7, fig.align = "center", warning = FALSE)
This vignette describes what is the rnoaa
package and how it should be used.
The rnoaa
package bundles a set of functions to get, clean, visualize, and analyze the data from the NOAA's Significant Earthquake dataset.
The function get_earthquake_data
allows you to load a snapshot of the earthquakes' data downloaded on September 10, 2017; on the proviso that you may want to download the latest data from the NOAA's Website, you can use the download_earthquake_data
function (requires internet access).
# Extracts the data from a compressed file. library(rnoaa) raw_data <- get_earthquake_data()
Once the data was downloaded, you may wish to prepare some key variables for data analysis; in particular, you could be interested in the quakes' date, location of the epicenter, magnitude, and the human cost of lives. Moreover, the name of the place in which the quake has taken place could be of your concern. So, eq_clean_data
and eq_create_label
functions allows you to mop these variables up for analysis.
The eq_clean_data
function creates the quakes' dates based on the three time variables within the NOAA's dataset: YEAR
, MONTH
, and DAY
. Specifically, it can create the dates for earthquakes that had occurred Before the Common Era (B.C.E.); this enhances the R's functionality because it does not handle these dates. In addition, when the month or/and day is/are missing, the date is approximated at the midpoint of the period.
On the other hand, the eq_location_clean
function removes the country from the variable that has the quake's location (i.e., LOCATION_NAME
variable) due to it does not make sense to keep it since a variable with the country (i.e., COUNTRY
variable) already exists.
library(dplyr) # Before the data has been processed: set.seed(11) raw_data %>% select(YEAR, MONTH, DAY, COUNTRY, LOCATION_NAME) %>% sample_n(6) # Tidies the data up for analysis. clean_data <- eq_clean_data(raw_data) clean_data <- eq_location_clean(clean_data) # After the data has been processed (note that the DATE variable has been # created and the country has been removed for the LOCATION_NAME variable): set.seed(11) clean_data %>% select(YEAR, MONTH, DAY, DATE, COUNTRY, LOCATION_NAME) %>% sample_n(6) # N.B.: When the month or/and day is/are missing, the date is approximated at # the midpoint of the period.
One of the features of the rnoaa
package is that has two ggplot2's geoms which assist you to visualize the timeline in which the earthquakes have ocurred: geom_timeline
and geom_timeline_label
.
The geom_timeline
displays each observation in a timeline to visualize the dates in which the quakes have played out. Due to this geom inherits its attributes from the class of geom_point
, you can use the aesthetics of size
and color
to display some other earthquake's traits such as magnitude and impact of human casualties.
In addition, the package provides a new ggplot2's theme (i.e., theme_timeline
) so as to make clearer these geoms.
library(ggplot2) # Timeline of earthquakes occurred as of 2000 in Chile. clean_data %>% filter(COUNTRY == "CHILE", !is.na(EQ_PRIMARY), YEAR %in% 2000:2016) %>% ggplot(mapping = aes(x = DATE, size = EQ_PRIMARY, color = TOTAL_DEATHS / 100) ) + geom_timeline() + labs(size = "Richter scale value", color = "# deaths in hundreds", y = "") + theme_timeline()
Note that the y
aesthetic is not required, unless you may want to display two or more timelines, in which case it permits you to visualize the quakes for different countries.
# Quakes in 2017 clean_data %>% filter(!is.na(EQ_PRIMARY), YEAR == 2017) %>% ggplot(mapping = aes(x = DATE, y = COUNTRY, size = EQ_PRIMARY, color = TOTAL_DEATHS, label = LOCATION_NAME) ) + geom_timeline() + labs(size = "Richter scale value", color = "# deaths", y = "") + theme_timeline()
If you need to identify the most powerful earthquakes, you can use the geom_timeline_label
along with the n_max
parameter. It labels the quakes with greater intensity in case you have provided the size
aesthetic.
# Quakes in Mexico from 1998 to 2005 clean_data %>% filter(COUNTRY == "MEXICO", !is.na(EQ_PRIMARY), YEAR %in% 1998:2005) %>% ggplot(mapping = aes(x = DATE, size = EQ_PRIMARY, color = TOTAL_DEATHS, label = LOCATION_NAME) ) + geom_timeline() + # We label the five biggest quakes in magnitude. geom_timeline_label(n_max = 5, line_height = 1 / 5, fontsize = 2.7) + labs(size = "Richter scale value", color = "# deaths", y = "") + theme_timeline()
N.B.: The size
aesthetic is not required. Had it been omitted, the geom would label the last observations:
# Quakes in Mexico from 1998 to 2005 clean_data %>% filter(COUNTRY == "MEXICO", !is.na(EQ_PRIMARY), YEAR %in% 1998:2005) %>% ggplot(mapping = aes(x = DATE, color = TOTAL_DEATHS, label = LOCATION_NAME) ) + geom_timeline() + # Were n_max omitted, it would display the three last observations. # (i.e., the default). geom_timeline_label(n_max = 5, line_height = 1 / 5, fontsize = 2.7) + labs(color = "# deaths", y = "") + theme_timeline()
The parameter line_height
controls the way how the vertical lines of the text labels are displayed, and since it is proportional to the available space, it avoids the overlapping with other levels (e.g., countries). In addition, you can use the following parameters to change the appearance of the text labels: angle
, line_height
, and fontsize
.
# Quakes observed from 1982 to 1983. clean_data %>% filter(!is.na(EQ_PRIMARY), YEAR %in% 1982:1983) %>% ggplot(mapping = aes(x = DATE, y = COUNTRY, size = EQ_PRIMARY, color = TOTAL_DEATHS, label = LOCATION_NAME) ) + geom_timeline() + # We label the most powerful quakes for each country. geom_timeline_label(n_max = 1, angle = 0, line_height = 1 / 5, fontsize = 2.7) + labs(size = "Richter scale value", color = "# deaths", y = "") + theme_timeline()
As a result of using the package to create the quakes' dates, we can visualize the earthquakes that had ocurred B.C.E.
# Quakes observed B.C.E. clean_data %>% filter(!is.na(EQ_PRIMARY), YEAR < 0) %>% ggplot(mapping = aes(x = DATE, y = COUNTRY, size = EQ_PRIMARY, color = TOTAL_DEATHS / 1000, label = LOCATION_NAME) ) + geom_timeline() + # We label the most powerful quakes for each country. geom_timeline_label(n_max = 1, angle = 0, line_height = 1 / 5, fontsize = 2.7) + labs(size = "Richter scale value", color = "# deaths in thousands", y = "") + theme_timeline()
rnoaa
not only uses ggplot2's geoms but also R's leaflet maps. The latter provide a greater user-interaction since they have a manifold of features (e.g., popup text labels, mini-maps, option buttons, etc.).
The package has the eq_map
function that displays an R leaflet map with the quake's epicenters and information of a given variable.
# Displays an R's leaflet map with the epicenters of earthquakes thad have # occurred in Indonesia as of 2000. The interactive map has popup text labels # with the date of occurrence. clean_data %>% dplyr::filter(COUNTRY == "INDONESIA" & lubridate::year(DATE) >= 2000) %>% eq_map(annot_col = "DATE")
Also, the package come along with the eq_create_label
function that can be used to provide additional information within the popup text labels displayed on the R leaflet map.
# Displays an R's leaflet map with the epicenters of the earthquakes that have # occurred in Indonesia as of 2000. The interactive map has popup text labels # with the location, magnitude, and total deaths. clean_data %>% dplyr::filter(COUNTRY == "INDONESIA" & lubridate::year(DATE) >= 2000) %>% dplyr::mutate(popup_text = eq_create_label(.)) %>% eq_map(annot_col = "popup_text")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.