As part of the Coursera's R Capstone course this project contains functionality to clean and plot raw NOAA earthquake data.
The raw earthquake data downloaded from NOAA is contained within the 'data_raw' folder. This raw data can be loaded into a dataframe with the load_data() function:
df_raw <- load_data()
After we have this raw data the process of cleaning it is completed with eq_clean_data():
eq_clean_data <- function(df_raw){
df <- df_raw
df <- df %>% dplyr::filter(YEAR >= 0) %>%
dplyr::mutate(date = get_date(DAY, MONTH, YEAR)) %>%
dplyr::mutate(LATITUDE = as.numeric(LATITUDE), LONGITUDE = as.numeric(LONGITUDE)) %>%
eq_location_clean()
return(df)
}
Notice that this completes a few tasks:
We can apply this data loading and cleaning simply with:
# Load and clean the data
df_earthquakes <- load_data() %>% eq_clean_data()
There are two geoms, the first of which (geom_timeline) looks to chart a timeline of earthquakes for a given country / countries with points (in the example) representing earthquake events, point size indicating earthquake magnitude and colour representing number of deaths. x (the date) is a required aesthetic whereas y (country) is optional. As an example:
df <- df_earthquakes %>% filter(COUNTRY %in% c("CHINA", "USA"), YEAR > 2000)
ggplot(df, aes(x = date, y = COUNTRY,
color = as.numeric(TOTAL_DEATHS),
size = as.numeric(EQ_PRIMARY),
label = CLEAN_LOCATION_NAME)) +
geom_timeline() +
labs(size = "Richter scale value", color = "# deaths") +
theme(panel.background = element_blank(),
legend.position = "bottom",
axis.title.y = element_blank()) +
xlab("DATE")
Which will produce the chart:
The second geom, called geom_timeline_label, looks to build on geom_timeline by adding labeled annotations. Vertical lines and location labels will be added to the top n_max (default = 5) earthquakes by magnitude.
df <- df_earthquakes %>% filter(COUNTRY %in% c("CHINA", "USA"), YEAR > 2000)
ggplot(df, aes(x = date, y = COUNTRY,
color = as.numeric(TOTAL_DEATHS),
size = as.numeric(EQ_PRIMARY),
label = CLEAN_LOCATION_NAME)) +
geom_timeline() +
labs(size = "Richter scale value", color = "# deaths") +
theme(panel.background = element_blank(),
legend.position = "bottom",
axis.title.y = element_blank()) + xlab("DATE") +
geom_timeline_label(data=df)
Here is the resulting chart:
There are also functions available to create and save these plots. The two geom examples above were created using the functions plot_earthquakes_timeline() and plot_earthquakes_timeline_label() which require as arguments a dataframe (df) and a boolean variable (save_png) indicating whether to save the result as a png file.
df <- df_earthquakes %>% filter(COUNTRY %in% c("CHINA", "USA"), YEAR > 2000)
plot_earthquakes_timeline(df, save_png=TRUE)
plot_earthquakes_timeline_label(df, save_png=TRUE)
The mapping functions require the leaflet package to run and will chart a subset of earthquake events on a map. In the example we take earthquakes in Mexico for years >= 2000. The eq_map() function requires a dataframe and annot_col (short for annotation column) as input. This function then returns a leflet map that can be printed to the viewer in RStudio. Here we annotate the earthquakes by date:
map1 <-load_data() %>%
eq_clean_data() %>%
dplyr::filter(COUNTRY == "MEXICO" & lubridate::year(date) >= 2000) %>%
eq_map(annot_col = "date")
print(map1)
Which produces the output:
The eq_create_label() function returns a vector of html formatted names with details including location, magnitude and deaths. We can use this to create a new column and pass this to our eq_map() function as the annot_col argument.
map2 <-load_data() %>%
eq_clean_data() %>%
dplyr::filter(COUNTRY == "MEXICO" & lubridate::year(date) >= 2000) %>%
dplyr::mutate(popup_text = eq_create_label(.)) %>%
eq_map(annot_col = "popup_text")
print(map2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.