In lter/lterdatasampler: Educational Dataset Examples from the Long Term Ecological Research Program

knitr::opts_chunk$set(
  message = FALSE, warning = FALSE,
  fig.width = 7,
  fig.height = 4
)

Dataset sample used

ntl_icecover
ntl_airtemp

Introduction

The vignette highlights:

How to explore time-series trends
How to combine group_by() with summarise() and mutate()
How to join two datasets with left_join()
An example of computing standardized anomalies on long time-series
Use of the patchwork package to combine plots into one figure

According to the EPA, lake ice duration can be an indicator of climate change. This is because lake ice is dependent on several environmental factors, so changes in these factors will influence the formation of ice on top of lakes. As a result, the study and analysis of lake ice formation can inform scientists about how quickly the climate is changing, and are critical to minimizing disruptions to lake ecosystems. We can examine the ice duration of Lake Mendota, Lake Monona, and Lake Wingra, three lakes in the Madison, WI area.

North Temperate Lakes, LTER CC BY-SA 4.0

Ice measurement Mendota lake, Photo E. Whitaker

Fun Fact:

Did you know that Joni Mitchell's album "Hejira" art is her skating on Lake Mendota in 1976 !? https://wisconsinlife.org/story/joni-mitchell-skates-on-lake-mendota-and-into-music-history/

Data Exploration

library(lterdatasampler)
library(tidyverse)
library(lubridate)
library(patchwork)

Ice Cover Duration Data

head(ntl_icecover)

Note that the tidyverse also provides glimpse() for a quick view of a dataset:

glimpse(ntl_icecover)

Now let's explore the data visually. We can explore the data distribution in the different lakes using a box plot

lake_ice <-
  ggplot(data = ntl_icecover %>% filter(!is.na(ice_duration)), aes(x = lakeid, y = ice_duration)) +
  geom_boxplot(aes(color = lakeid, shape = lakeid),
               alpha = 0.8,
               width = 0.5) +
  theme_minimal() +
  labs(
    title = "Ice Duration of Lakes in the Madison, WI Area",
    y = "Ice Duration (Days)",
    x = "Lake",
    subtitle = "North Temperate Lakes LTER"
  ) +
  geom_jitter(
    aes(color = lakeid),
    alpha = 0.5,
    show.legend = FALSE,
    position = position_jitter(width = 0.2, seed = 0)
  )

lake_ice

Looking at the time-series across the different lakes:

ice_duration <-
  ggplot(data = ntl_icecover, aes(x = year)) +
  geom_line(aes(y = ice_duration, color = lakeid), alpha = 0.7) +
  theme_minimal() +
  labs(x = "Year", y = "Ice Duration (Days)", title = "Ice Duration of Lakes in the Madison, WI Area", subtitle = "North Temperate Lakes LTER")

ice_duration

Box plots and time series plots reveal similarities across the 2 lakes. What does the trend look like for average ice cover duration in the region, based on data for the 2 lakes here?

ntl_icecover_avg <- ntl_icecover %>%
  drop_na(ice_duration) %>% 
  group_by(year) %>%
  summarise(ice_duration = mean(ice_duration)) %>%
  rename(avg_ice_duration = ice_duration)

ntl_icecover_avg

Time Series Plot:

avg_ice_duration <-
  ggplot(data = ntl_icecover_avg %>% filter(!is.na(avg_ice_duration)), aes(x = year, y = avg_ice_duration)) +
  geom_line(alpha = 0.7) +
  theme_minimal() +
  geom_smooth(
    method = lm,
    se = FALSE,
    color = "blue",
    linetype = "dashed"
  ) + 
  labs(y = "Ice Duration (Days)", x = "Year", title = "Average Ice Duration", subtitle = "North Temperate Lakes LTER")

avg_ice_duration

Over the 168 years of this time series, the average ice duration in North Temperate Lakes is decreasing from about 120 days to about 80 days. Let us investigate a little more what could be the environmental factors are influencing this change!

Pancake ice formed near shore when early ice is broken by wave action, Lake Mendota, Dec. 2003; Photo: J. Magnuson

Research suggests that mean annual temperature is one of the primary factors that alter lake ice formation. We can look at the temperature data of Madison, WI found in ntl_airtemp to see if there is a corresponding change in climate that may have influenced the change in ice duration.

Note that according to the original metadata: "Daily temperature data prior to 1884 were estimated from 3 times per day sampling and biases are expected and should not be comparable with data after that time."

Air Temperature Data

head(ntl_airtemp)

We can compute the mean annual air temperature using group_by and summarise:

ntl_airtemp_avg <- ntl_airtemp %>% 
  filter(year > 1884) %>%   # filter out the known biased data
  group_by(year) %>% 
  summarise(avg_air_temp_adjusted = mean(ave_air_temp_adjusted, na.rm=TRUE))

ntl_airtemp_avg

Time Series Plot:

temp_plot <-
  ggplot(data = ntl_airtemp_avg,
         aes(x = year, y = avg_air_temp_adjusted)) +
  geom_line(alpha = 0.7) +
  theme_minimal() +
  geom_smooth(
    method = lm,
    se = FALSE,
    color = "blue",
    linetype = "dashed"
  ) +
  labs(y = "Mean Air Temperature Nov-April (Celsius)", x = "Year", title = "Mean Annual Temperature", subtitle = "Madison, WI")

temp_plot

There is an upward trend in the mean annual air temperature. Let's investigate potential relationships between air temperature and lake ice cover duration.

Investigate the relationship between ice cover duration and air temperature

To compare the ice cover duration and temperature data directly, the ntl_airtemp and ntl_icecover datasets can be joined. Since lake freezing is mostly impacted by Fall and Spring air temperatures, we will first compute the mean temperature from Nov to April (next year) to best capture the potential relationship:

# Add a column to group the Fall and Spring season into a same year, similarly to what is done when defining hydrological year
ntl_airtemp_hydro <- ntl_airtemp %>%
  mutate(hydroyear = if_else(month(sampledate) < 10, year-1, year))

# Compute the average air temperature from Nov to April
ntl_airtemp_avg_winter <-  ntl_airtemp_hydro %>%
  filter(month(sampledate) %in% c(11:12,1:4)) %>% # filter the months from Nov to April
  group_by(hydroyear) %>%
  summarise(avg_air_temp_adjusted = mean(ave_air_temp_adjusted))

Let us plot the average air temperature time-series computed from the cold season:

temp_plot <-
  ggplot(data = ntl_airtemp_avg_winter %>% filter(!is.na(avg_air_temp_adjusted)),
         aes(x = hydroyear, y = avg_air_temp_adjusted)) +
  geom_line(alpha = 0.7) +
  theme_minimal() +
  geom_smooth(
    method = lm,
    se = FALSE,
    color = "blue",
    linetype = "dashed"
  ) +
  labs(y = "Mean Air Temperature Nov-April (Celsius)", x = "Year", title = "Cold Season (Nov to Apr) Mean Annual Temperature", subtitle = "Madison, WI")

temp_plot

To further investigate the relationship between the air temperature during the cold season and the ice cover duration, we can join the two datasets together using left_join:

ntl_joined_avg <- ntl_icecover_avg  %>%
  left_join(by = c("year" = "hydroyear"), ntl_airtemp_avg_winter)

ntl_joined_avg

Plot the average air temperature against the average ice cover duration:

scatter <-
  ggplot(data = ntl_joined_avg ,
       aes(y = avg_ice_duration, x = avg_air_temp_adjusted)) + geom_point(alpha = 0.8) +
  theme_minimal() +
  labs(
    title = "Cold Season Mean Air Temperature and Ice Duration of Lakes",
    y = "Ice Duration (Days)",
    x = "Mean Air Temperature Nov-April (Celsius)",
    subtitle = "North Temperate Lakes LTER"
  ) +
  geom_smooth(
    method = "lm",
    color = "black",
    se = FALSE,
    size = 0.3
  )
scatter

There is a negative correlation between the mean air temperature from November to April and the amount of time that each lake was frozen. This means that as temperature increases, the ice duration decreases.

Standardized Anomalies

Now let's investigate how this relationship evolves over time. To compare the temperature with the ice duration on a time-series plot on a similar scale we are going to compute standardized anomalies for both parameters. We are going to split the time-series in two parts. The first 30 years will be used to compute the climatological reference (mean and standard deviation), that we will use to compare the other years against. Since the earliest year of observation for ice duration was 1851, and the earliest year of observation for temperature was 1869, we opted to compute the climatological references from 1870 to 1900.

In this example, the standardized anomalies for the temperature (and ice cover duration) are computed as follow:

$A_{y_{stand}} = (T_{y} - \overline{T}{clim}) / T{SD_{clim}}$

Where:

$A_{y_{stand}}$ = Standardized anomalies
$T_{y}$ = mean temperature for a specific cold season
$\overline{T}_{clim}$ = Mean temperature from 1870 to 1900
$T_{SD_{clim}}$ = Standard deviation for temperature from 1870 to 1900

Air Temperature Climatologies

Let us start with computing the climatologies and standardized anomalies for the air temperature:

# Using the first 30 years to compute the yearly climatologies
ntl_airtemp_clim <- ntl_airtemp_avg_winter %>%
  filter(hydroyear > 1869 & hydroyear < 1901) %>%   # taking the first 30 years of common data
  summarise(clim_winter_mean_airtemp = mean(avg_air_temp_adjusted, na.rm=TRUE),  # computing the climatologies
            clim_winter_sd_airtemp = sd(avg_air_temp_adjusted, na.rm=TRUE)) %>%
  bind_cols(filter(ntl_airtemp_avg_winter,hydroyear > 1900), .) %>%  # Adding the climatology columns to the second part of the time-series
  mutate(standardized_anomalies_airtemp = (avg_air_temp_adjusted - clim_winter_mean_airtemp) / clim_winter_sd_airtemp)  # computing the standardized anomalies

Let's plot the standardized anomalies time-series:

temp_plot_airtemp_anomalies <-
  ggplot(data = ntl_airtemp_clim,
         aes(x = hydroyear, y = standardized_anomalies_airtemp)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  theme_minimal() +
  # scale_x_date(date_breaks = "year", date_labels = "%b") +
  geom_smooth(
    method = lm,
    se = FALSE,
    color = "blue",
    linetype = "dashed"
  ) +
  labs(y = "Standaradized Anomalies", x = "Year", title = "Air temperature standaradized anomalies", subtitle = "Madison, WI")

temp_plot_airtemp_anomalies

Ice Cover Duration Climatologies

Now doing the same computations for the ice cover duration:

# Using the first 30 years to compute the yearly climatologies
ntl_iceduration_clim_30first <- ntl_icecover_avg %>%
  filter(year > 1869 & year < 1901) %>%  # taking the first 30 years of common data
  summarise(clim_mean_iceduration = mean(avg_ice_duration, na.rm=TRUE),  # computing the climatologies
         clim_sd_iceduration = sd(avg_ice_duration, na.rm=TRUE)) 

# compute the yearly standardized anomalies
ntl_iceduration_clim <- ntl_icecover %>%
  filter(year > 1900) %>%   # using the following years of the time series
  add_column(clim_mean_iceduration = ntl_iceduration_clim_30first$clim_mean_iceduration,
             clim_sd_iceduration = ntl_iceduration_clim_30first$clim_sd_iceduration) %>%
  mutate(standardized_anomalies_iceduration = (ice_duration - clim_mean_iceduration) / clim_sd_iceduration) # computing the standardized anomalies

Let's plot the standardized anomalies time-series:

temp_plot_iceduration_anomalies <-
  ggplot(data = ntl_iceduration_clim,
         aes(x = year, y = standardized_anomalies_iceduration)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  theme_minimal() +
  # scale_x_date(date_breaks = "year", date_labels = "%b") +
  geom_smooth(
    method = lm,
    se = FALSE,
    color = "blue",
    linetype = "dashed"
  ) +
  labs(y = "Standaradized Anomalies", x = "Year", title = "Ice cover duration standaradized anomalies", subtitle = "North Temperate Lakes LTER")

temp_plot_iceduration_anomalies

Using the patchwork package, it is very easy to combine graphs together: just add them!!

temp_plot_airtemp_anomalies + temp_plot_iceduration_anomalies

or try vertically:

temp_plot_airtemp_anomalies / temp_plot_iceduration_anomalies

Note the difference of scale on the y-axis. However in both time series, there is an increase in anomalies (both in frequency and intensity) over time, suggesting the recent year are more consistently warmer and with shorter period with frozen lakes

Other things to explore:

How are the freezing starting date and melting date changing over time?
How does interpretation change with different methods / visualizations of smoothing (e.g. moving average)
Predictions of ice cover duration in the future
Comparison of how those predictions change if you pick a longer or shorter dataset (e.g. how does the coefficient compare if you use all of the data, versus just data since ~1975?)
See the last citation below for more ideas!

Aknowledgement

Thank you to Hilary Dugan, Corinna Gries and John Magnuson for their inputs while developing this vignette. Thank you also go to Adhitya Logan for his extra work on this vignette.

Citation

Anderson, L. and D. Robertson. 2020. Madison Wisconsin Daily Meteorological Data 1869 - current ver 32. Environmental Data Initiative. https://doi.org/10.6073/pasta/e3ff85971d817e9898bb8a83fb4c3a8b (Accessed 2021-03-08).

Magnuson, J.J., S.R. Carpenter, and E.H. Stanley. 2021. North Temperate Lakes LTER: Ice Duration - Madison Lakes Area 1853 - current ver 35. Environmental Data Initiative. https://doi.org/10.6073/pasta/ab31f2489ee436beb73fc8f1d0213d97 (Accessed 2021-03-08).

Magnuson, J.J. 2021. Seeing the invisible present and place: from years to centuries with lake ice from Wisconsin to the Northern Hemisphere. Chapter 9 (243- 277) in R. B. Waide and S. E. Kingsland [eds]. The Challenges of Long Term Ecological Research: A Historical Analysis. Springer, Archimedes Series #59. https://doi.org/10.1007/978-3-030-66933-1_9 (Accessed 2022-02-14).

How we processed the raw data

Ice Cover

r knitr::spin_child(here::here("data-raw", "ntl_icecover_data.R"))

Air Temperature

r knitr::spin_child(here::here("data-raw", "ntl_airtemp_data.R"))

lter/lterdatasampler documentation built on Oct. 12, 2023, 3:34 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lter/lterdatasampler
Educational Dataset Examples from the Long Term Ecological Research Program

In lter/lterdatasampler: Educational Dataset Examples from the Long Term Ecological Research Program

Dataset sample used

Introduction

Data Exploration

Ice Cover Duration Data

Air Temperature Data

Investigate the relationship between ice cover duration and air temperature

Standardized Anomalies

Air Temperature Climatologies

Ice Cover Duration Climatologies

Other things to explore:

Aknowledgement

Citation

How we processed the raw data

Ice Cover

Air Temperature

R Package Documentation

Browse R Packages

We want your feedback!

lter/lterdatasampler Educational Dataset Examples from the Long Term Ecological Research Program

In lter/lterdatasampler: Educational Dataset Examples from the Long Term Ecological Research Program

Dataset sample used

Introduction

Data Exploration

Ice Cover Duration Data

Air Temperature Data

Investigate the relationship between ice cover duration and air temperature

Standardized Anomalies

Air Temperature Climatologies

Ice Cover Duration Climatologies

Other things to explore:

Aknowledgement

Citation

How we processed the raw data

Ice Cover

Air Temperature

R Package Documentation

Browse R Packages

We want your feedback!

lter/lterdatasampler
Educational Dataset Examples from the Long Term Ecological Research Program