knitr::opts_chunk$set( message = FALSE, warning = FALSE, fig.width = 7, fig.height = 4 )
ntl_icecover
ntl_airtemp
The vignette highlights:
group_by()
with summarise()
and mutate()
left_join()
patchwork
package to combine plots into one figureAccording to the EPA, lake ice duration can be an indicator of climate change. This is because lake ice is dependent on several environmental factors, so changes in these factors will influence the formation of ice on top of lakes. As a result, the study and analysis of lake ice formation can inform scientists about how quickly the climate is changing, and are critical to minimizing disruptions to lake ecosystems. We can examine the ice duration of Lake Mendota, Lake Monona, and Lake Wingra, three lakes in the Madison, WI area.
![]() |
![]() |
Fun Fact:
Did you know that Joni Mitchell's album "Hejira" art is her skating on Lake Mendota in 1976 !? https://wisconsinlife.org/story/joni-mitchell-skates-on-lake-mendota-and-into-music-history/
library(lterdatasampler) library(tidyverse) library(lubridate) library(patchwork)
head(ntl_icecover)
Note that the tidyverse also provides glimpse()
for a quick view of a dataset:
glimpse(ntl_icecover)
Now let's explore the data visually. We can explore the data distribution in the different lakes using a box plot
lake_ice <- ggplot(data = ntl_icecover %>% filter(!is.na(ice_duration)), aes(x = lakeid, y = ice_duration)) + geom_boxplot(aes(color = lakeid, shape = lakeid), alpha = 0.8, width = 0.5) + theme_minimal() + labs( title = "Ice Duration of Lakes in the Madison, WI Area", y = "Ice Duration (Days)", x = "Lake", subtitle = "North Temperate Lakes LTER" ) + geom_jitter( aes(color = lakeid), alpha = 0.5, show.legend = FALSE, position = position_jitter(width = 0.2, seed = 0) ) lake_ice
Looking at the time-series across the different lakes:
ice_duration <- ggplot(data = ntl_icecover, aes(x = year)) + geom_line(aes(y = ice_duration, color = lakeid), alpha = 0.7) + theme_minimal() + labs(x = "Year", y = "Ice Duration (Days)", title = "Ice Duration of Lakes in the Madison, WI Area", subtitle = "North Temperate Lakes LTER") ice_duration
Box plots and time series plots reveal similarities across the 2 lakes. What does the trend look like for average ice cover duration in the region, based on data for the 2 lakes here?
ntl_icecover_avg <- ntl_icecover %>% drop_na(ice_duration) %>% group_by(year) %>% summarise(ice_duration = mean(ice_duration)) %>% rename(avg_ice_duration = ice_duration) ntl_icecover_avg
Time Series Plot:
avg_ice_duration <- ggplot(data = ntl_icecover_avg %>% filter(!is.na(avg_ice_duration)), aes(x = year, y = avg_ice_duration)) + geom_line(alpha = 0.7) + theme_minimal() + geom_smooth( method = lm, se = FALSE, color = "blue", linetype = "dashed" ) + labs(y = "Ice Duration (Days)", x = "Year", title = "Average Ice Duration", subtitle = "North Temperate Lakes LTER") avg_ice_duration
Over the 168 years of this time series, the average ice duration in North Temperate Lakes is decreasing from about 120 days to about 80 days. Let us investigate a little more what could be the environmental factors are influencing this change!
Research suggests that mean annual temperature is one of the primary factors that alter lake ice formation. We can look at the temperature data of Madison, WI found in ntl_airtemp
to see if there is a corresponding change in climate that may have influenced the change in ice duration.
Note that according to the original metadata: "Daily temperature data prior to 1884 were estimated from 3 times per day sampling and biases are expected and should not be comparable with data after that time."
head(ntl_airtemp)
We can compute the mean annual air temperature using group_by
and summarise
:
ntl_airtemp_avg <- ntl_airtemp %>% filter(year > 1884) %>% # filter out the known biased data group_by(year) %>% summarise(avg_air_temp_adjusted = mean(ave_air_temp_adjusted, na.rm=TRUE)) ntl_airtemp_avg
Time Series Plot:
temp_plot <- ggplot(data = ntl_airtemp_avg, aes(x = year, y = avg_air_temp_adjusted)) + geom_line(alpha = 0.7) + theme_minimal() + geom_smooth( method = lm, se = FALSE, color = "blue", linetype = "dashed" ) + labs(y = "Mean Air Temperature Nov-April (Celsius)", x = "Year", title = "Mean Annual Temperature", subtitle = "Madison, WI") temp_plot
There is an upward trend in the mean annual air temperature. Let's investigate potential relationships between air temperature and lake ice cover duration.
To compare the ice cover duration and temperature data directly, the ntl_airtemp
and ntl_icecover
datasets can be joined. Since lake freezing is mostly impacted by Fall and Spring air temperatures, we will first compute the mean temperature from Nov to April (next year) to best capture the potential relationship:
# Add a column to group the Fall and Spring season into a same year, similarly to what is done when defining hydrological year ntl_airtemp_hydro <- ntl_airtemp %>% mutate(hydroyear = if_else(month(sampledate) < 10, year-1, year)) # Compute the average air temperature from Nov to April ntl_airtemp_avg_winter <- ntl_airtemp_hydro %>% filter(month(sampledate) %in% c(11:12,1:4)) %>% # filter the months from Nov to April group_by(hydroyear) %>% summarise(avg_air_temp_adjusted = mean(ave_air_temp_adjusted))
Let us plot the average air temperature time-series computed from the cold season:
temp_plot <- ggplot(data = ntl_airtemp_avg_winter %>% filter(!is.na(avg_air_temp_adjusted)), aes(x = hydroyear, y = avg_air_temp_adjusted)) + geom_line(alpha = 0.7) + theme_minimal() + geom_smooth( method = lm, se = FALSE, color = "blue", linetype = "dashed" ) + labs(y = "Mean Air Temperature Nov-April (Celsius)", x = "Year", title = "Cold Season (Nov to Apr) Mean Annual Temperature", subtitle = "Madison, WI") temp_plot
To further investigate the relationship between the air temperature during the cold season and the ice cover duration, we can join the two datasets together using left_join
:
ntl_joined_avg <- ntl_icecover_avg %>% left_join(by = c("year" = "hydroyear"), ntl_airtemp_avg_winter) ntl_joined_avg
Plot the average air temperature against the average ice cover duration:
scatter <- ggplot(data = ntl_joined_avg , aes(y = avg_ice_duration, x = avg_air_temp_adjusted)) + geom_point(alpha = 0.8) + theme_minimal() + labs( title = "Cold Season Mean Air Temperature and Ice Duration of Lakes", y = "Ice Duration (Days)", x = "Mean Air Temperature Nov-April (Celsius)", subtitle = "North Temperate Lakes LTER" ) + geom_smooth( method = "lm", color = "black", se = FALSE, size = 0.3 ) scatter
There is a negative correlation between the mean air temperature from November to April and the amount of time that each lake was frozen. This means that as temperature increases, the ice duration decreases.
Now let's investigate how this relationship evolves over time. To compare the temperature with the ice duration on a time-series plot on a similar scale we are going to compute standardized anomalies for both parameters. We are going to split the time-series in two parts. The first 30 years will be used to compute the climatological reference (mean and standard deviation), that we will use to compare the other years against. Since the earliest year of observation for ice duration was 1851, and the earliest year of observation for temperature was 1869, we opted to compute the climatological references from 1870 to 1900.
In this example, the standardized anomalies for the temperature (and ice cover duration) are computed as follow:
$A_{y_{stand}} = (T_{y} - \overline{T}{clim}) / T{SD_{clim}}$
Where:
Let us start with computing the climatologies and standardized anomalies for the air temperature:
# Using the first 30 years to compute the yearly climatologies ntl_airtemp_clim <- ntl_airtemp_avg_winter %>% filter(hydroyear > 1869 & hydroyear < 1901) %>% # taking the first 30 years of common data summarise(clim_winter_mean_airtemp = mean(avg_air_temp_adjusted, na.rm=TRUE), # computing the climatologies clim_winter_sd_airtemp = sd(avg_air_temp_adjusted, na.rm=TRUE)) %>% bind_cols(filter(ntl_airtemp_avg_winter,hydroyear > 1900), .) %>% # Adding the climatology columns to the second part of the time-series mutate(standardized_anomalies_airtemp = (avg_air_temp_adjusted - clim_winter_mean_airtemp) / clim_winter_sd_airtemp) # computing the standardized anomalies
Let's plot the standardized anomalies time-series:
temp_plot_airtemp_anomalies <- ggplot(data = ntl_airtemp_clim, aes(x = hydroyear, y = standardized_anomalies_airtemp)) + geom_bar(stat = "identity", show.legend = FALSE) + theme_minimal() + # scale_x_date(date_breaks = "year", date_labels = "%b") + geom_smooth( method = lm, se = FALSE, color = "blue", linetype = "dashed" ) + labs(y = "Standaradized Anomalies", x = "Year", title = "Air temperature standaradized anomalies", subtitle = "Madison, WI") temp_plot_airtemp_anomalies
Now doing the same computations for the ice cover duration:
# Using the first 30 years to compute the yearly climatologies ntl_iceduration_clim_30first <- ntl_icecover_avg %>% filter(year > 1869 & year < 1901) %>% # taking the first 30 years of common data summarise(clim_mean_iceduration = mean(avg_ice_duration, na.rm=TRUE), # computing the climatologies clim_sd_iceduration = sd(avg_ice_duration, na.rm=TRUE)) # compute the yearly standardized anomalies ntl_iceduration_clim <- ntl_icecover %>% filter(year > 1900) %>% # using the following years of the time series add_column(clim_mean_iceduration = ntl_iceduration_clim_30first$clim_mean_iceduration, clim_sd_iceduration = ntl_iceduration_clim_30first$clim_sd_iceduration) %>% mutate(standardized_anomalies_iceduration = (ice_duration - clim_mean_iceduration) / clim_sd_iceduration) # computing the standardized anomalies
Let's plot the standardized anomalies time-series:
temp_plot_iceduration_anomalies <- ggplot(data = ntl_iceduration_clim, aes(x = year, y = standardized_anomalies_iceduration)) + geom_bar(stat = "identity", show.legend = FALSE) + theme_minimal() + # scale_x_date(date_breaks = "year", date_labels = "%b") + geom_smooth( method = lm, se = FALSE, color = "blue", linetype = "dashed" ) + labs(y = "Standaradized Anomalies", x = "Year", title = "Ice cover duration standaradized anomalies", subtitle = "North Temperate Lakes LTER") temp_plot_iceduration_anomalies
Using the patchwork
package, it is very easy to combine graphs together: just add them!!
temp_plot_airtemp_anomalies + temp_plot_iceduration_anomalies
or try vertically:
temp_plot_airtemp_anomalies / temp_plot_iceduration_anomalies
Note the difference of scale on the y-axis. However in both time series, there is an increase in anomalies (both in frequency and intensity) over time, suggesting the recent year are more consistently warmer and with shorter period with frozen lakes
Thank you to Hilary Dugan, Corinna Gries and John Magnuson for their inputs while developing this vignette. Thank you also go to Adhitya Logan for his extra work on this vignette.
Anderson, L. and D. Robertson. 2020. Madison Wisconsin Daily Meteorological Data 1869 - current ver 32. Environmental Data Initiative. https://doi.org/10.6073/pasta/e3ff85971d817e9898bb8a83fb4c3a8b (Accessed 2021-03-08).
Magnuson, J.J., S.R. Carpenter, and E.H. Stanley. 2021. North Temperate Lakes LTER: Ice Duration - Madison Lakes Area 1853 - current ver 35. Environmental Data Initiative. https://doi.org/10.6073/pasta/ab31f2489ee436beb73fc8f1d0213d97 (Accessed 2021-03-08).
Magnuson, J.J. 2021. Seeing the invisible present and place: from years to centuries with lake ice from Wisconsin to the Northern Hemisphere. Chapter 9 (243- 277) in R. B. Waide and S. E. Kingsland [eds]. The Challenges of Long Term Ecological Research: A Historical Analysis. Springer, Archimedes Series #59. https://doi.org/10.1007/978-3-030-66933-1_9 (Accessed 2022-02-14).
r knitr::spin_child(here::here("data-raw", "ntl_icecover_data.R"))
r knitr::spin_child(here::here("data-raw", "ntl_airtemp_data.R"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.