Welcome to the Climate Futures Toolbox

This vignette provides a walk-through of a common use case of the cft package: understanding climate futures for a region of interest. We'll use Wind Cave National Park, located in South Dakota, USA as a case study.

What you'll learn

This vignette will show you how to:

What you'll need

To get the most out of this vignette, we assume you have:

About the data

Global Circulation Models (GCMs) provide estimates of historical and future climate conditions. The complexity of the climate system has lead to a large number GCMs and it is common practice to examine outputs from many different models, treating each as one plausible future.

Most GCMs are spatially coarse (often 1 degree), but downscaling provides finer scale estimates. The cft package uses one downscaled climate model called MACA (Multivariate Adaptive Climate Analog) Version 2 (details here).

Acquiring and subsetting data within National Park Service boundaries

This package was originally written with the National Park Service in mind, so it has the option to use the name of any park (or monument, preserve, etc.) within the NPS. Use the cftdata() function to specify a range of years, a set of models, a set of parameters, and a set of representative concentration pathways to return. Leaving these arguments empty will result in a download of all available data for that location.

Loading the cft package from github

library(devtools)
install_github("earthlab/cft")

Attach cft and check the list of available functions

library(cft)
ls(pos="package:cft")

Look at the documentation for those functions

?available_data
?single_point_firehose

Test API connection

web_link = "https://cida.usgs.gov/thredds/dodsC/macav2metdata_daily_future"

# Change to "https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/macav2metdata_daily_historical" for historical data. 

src <- tidync::tidync(web_link)

Use read-only mode to find available data without initiating a full download.

inputs <- cft::available_data()

Filter the results from available_data() to specify which data to actually download.

Filter time

The following code is used to create a filter which will select data from the year 2034. This filter creates an index which is given the value of 0 for times that are not in year 2034 and a value of 1 for times that are in the year 2034.

library(tibble)

# Year 2034
time_min <- 72048
time_max <- 73048

input_times <- inputs$available_times %>% add_column(index = 0) 
input_times[which(inputs$available_times[,1] > time_min & inputs$available_times[,1] < time_max ),3] <- 1

tail(input_times)

Filter variable names

The following code creates a filter to select precipitation and maximum temperature data from the emission scenarios RCP 4.5 and RCP 8.5 from the Norwegian Earth System Model, the Hadley Global Environment Model 2, and the Community Climate System Model 4. This is done by filtering by variable, by model, and by scenario using a separate call to the filter() function for each filtering step.

input_variables <- inputs$variable_names %>% 
  filter(Variable == c("Precipitation", "Maximum Temperature")) %>% 
  filter(Model == c("Norwegian Earth System Model 1 - Medium Resolution", "Hadley Global Environment Model 2 - Earth System 365 (day)", "Community Climate System Model 4")) %>%
  filter(Scenario == c("RCP 4.5", "RCP 8.5")) %>% 
  pull("Available variable")

input_variables

Establish area of interst (AOI) by bounding box

<<<<<<< HEAD

This code obtains the bounding box from Open Street Map (OSM) for the national parks adjacent to Yellowstone National Park and then manually updates the corners of the bounding box for this region to match the official bounding box from the National Parks Service. This is done because the bounding box obtained from OSM is larger than the official bounding box from the National Parks Service so this adjustment makes sure that the bounding box for our area of interest matches our desired area of interest.

bb <- getbb("yellowstone")
bb_manual <- bb
bb_manual[1,1] <- -111.15594815937659
bb_manual[1,2] <- -109.8305463801207
bb_manual[2,1] <- 44.12354048271325
bb_manual[2,2] <- 45.11911641599412

my_boundary <- opq(bb_manual) %>% 
  add_osm_feature(key = "boundary", value = "national_park") %>% 
osmdata_sf() 

my_boundary
my_boundary$osm_multipolygons
my_boundary$osm_multipolygons[1,]

boundaries <- my_boundary$osm_multipolygons[1,] 
pulled_bb <- st_bbox(boundaries)
pulled_bb

This code produces a map of the area of interest which can be used to ensure that the bounding box that was previously defined for the data matches our desired area of interest.

basemap <- ggplot(data = boundaries) +
  geom_sf(fill = "cornflowerblue") +
  geom_sf_text(aes(label = boundaries$name), size=10) 

basemap

Download data by AOI, filtered times, and filtered variable list

This code downloads precipitation data from the Beijing Normal University Earth System Model and the Hadley Global Environmental Model 2 for emission scenario RCP 8.5 for the year 2034 in the national parks that touch Yellowstone National Park. First, the hyper_filter() function is used to select data from the national parks that touch Yellowstone National Park by filtering both the latitude and longitude coordinates to coordinates within a small delta of the minimum and maximum coordinates of the bounding box that was previously defined. Then the hyper_filter() function is used to filter the data to data from the year 2034 using the time filter that was defined above. Finally, the hyper_tibble() function is used to select the precipitation data from the Beijing Normal University Earth System Model and the Hadley Global Environmental Model 2 for emission scenario RCP 8.5 using the filter that was defined above.

Pulled_data <- inputs$src %>% 
  hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% 
  hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% 
  hyper_filter(time =  input_times[,3] == 1) %>% 
  hyper_tibble(select_var = input_variables[1:50]
    ) %>%
  st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant")

head(Pulled_data)

This plot shows the data that were downloaded as red dots for the earliest time in year 2034 over the map of the national parks that touch Yellowstone National Park. The data are shown for the earliest time in year 2034 because data were downloaded for multiple times so filtering to a single point in time shows the spatial distribution of the downloaded data without having to include each of the times for which data were downloaded.

check_filter <- Pulled_data %>% filter(time == min(Pulled_data$time))

ggplot() +
  geom_sf(data = boundaries, fill = "cornflowerblue") +
 geom_sf(data = check_filter, color = "red", size=5) +
  coord_sf(crs = 4326) 

Melt downloaded points into a raster before aggregation

rast <- st_rasterize(Pulled_data) 
plot(rast)

#Pulled_data %>% as.data.frame() %>% brick()

Aggregate downloaded data to different spatial objects

Aggregate to polygon (faster method)

extracted <- st_extract(rast, boundaries$geometry) %>% st_as_sf()
names(extracted)[1] <- "nn"
ggplot(data=extracted) +
  geom_sf(aes(fill = nn)) +
  scale_fill_continuous(low="thistle2", high="darkred", 
                       guide="colorbar",na.value="white")+
  coord_sf(crs = 4326)

Clip raster with polygon (slower method)

small_pulled <- Pulled_data %>%
  filter(time == 72049)
intersection <- st_intersection(x = small_pulled, y = boundaries$geometry)

names(intersection)[1:2] <- c("Precipitation","b")
library(ggthemes)
ggplot() +
  geom_sf(data = intersection, aes(color=Precipitation), size=3, shape=15) +
  scale_color_continuous(low="thistle2", high="darkred", 
                       guide="colorbar",na.value="white")+
  geom_sf(data = boundaries, fill = NA, color = "white") +
  theme_tufte()+
  labs(title = "ROCKY MOUNTAIN NATIONAL PARK", subtitle = "Precipitation in 2050")

Aggregate to River segment

river <- opq(pulled_bb) %>%
  add_osm_feature(key = "waterway", value = "river") %>%
  osmdata_sf() 
river
river_sub <- st_buffer(river$osm_lines, 5000)
extracted_river <- st_extract(rast,  river_sub$geometry ) %>% st_as_sf()
head(extracted_river)
ggplot(data=extracted_river) +
  geom_sf(aes(fill = pr_HadGEM2.ES365_r1i1p1_rcp85), size=0) +
   coord_sf(crs = 4326, xlim = c(pulled_bb[1], pulled_bb[3]), 
           ylim = c(pulled_bb[2], pulled_bb[4]),
           expand = FALSE) +
  scale_fill_continuous(low="thistle2", high="darkred", 
                       guide="colorbar",na.value="white")+
  labs(title = "Rivers of Yellowstone",
       subtitle = "Projected humidity in 2040", 
       caption = "Data Source: Climate Futures...") + 
  theme_tufte()

Aggregate to road segment

roads <- opq(pulled_bb) %>%
  add_osm_feature(key = 'highway', value = 'primary') %>%
  add_osm_feature(key = 'highway', value = 'secondary') %>%
  osmdata_sf() 
roads_sub <- st_buffer(roads$osm_points, 2200)
extracted_roads <- st_extract(rast,  st_as_sfc(roads_sub) ) %>% st_as_sf()
extracted_roads
ggplot(data=extracted_roads) +
  geom_sf(aes(fill = pr_HadGEM2.ES365_r1i1p1_rcp85), size=0) +
   coord_sf(crs = 4326) +
  scale_fill_continuous(low="thistle2", high="darkred", 
                       guide="colorbar",na.value="white")+
  labs(title = "Roads of Yellowstone",
       subtitle = "Projected humidity in 2040", 
       caption = "Data Source: Climate Futures...") + 
  theme_tufte()

Computing new daily climate variables

Now that we have all of the climate parameters for our study region, we can compute functions of those variables. For example, it is common to compute the midpoint of the maximum and minimum daily temperature, which we can do using the mutate function:

df <- data.frame(tasmax, tasmin)

df <- df %>%
  mutate(tasmid = (tasmax + tasmin) / 2)

Now we have a new column called tasmid that is the midpoint of the maximum and minumum daily temperature!

Wind speed provides another example of a derived parameter that can be computed for each day. By default, we have two wind-related parameters: the eastward wind component (called uas) and the northward wind component (called vas), both in units of meters per second (you can get this information from cft::argument_reference). Wind speed can be computed from vas and uas using the Pythagorean theorem:

```{=latex} $\text{Wind speed} = \sqrt{v_{as}^2 + u_{as}^2}.$

In code: 

```r
df <- df %>%
  mutate(wind_speed = sqrt(vas^2 + uas^2))

Computing new climate variable summaries

Sometimes, there are new climate variables that summarize daily data. For example, you may want to compute:

All of these quantities summarize daily data, and require some aggregation time interval which in many cases will be one year. As an example, we will compute the growing season length for Wind Cave National Park across all models and emissions scenarios. To do this, we first need to define a new column for year, which we will use as a grouping variable:

df <- df %>%
  mutate(year = year(date))

Now, we want to compute growing season length for each year, model, emissions scenario combination.

growing_seasons <- df %>%
  group_by(rcp, model, year, ensemble) %>%
  summarize(season_length = sum(tasmid > 273.15)) %>%
  ungroup

Notice that we used our derived temperature midpoint column tasmid, and computed the total (sum()) number of days for each group where the temperature midpoint was greater than 0 C (or, 273.15 Kelvin, which are the units of the temperature data).

growing_seasons

Let's visualize the growing season over time for each model and emission scenario:

growing_seasons %>%
  ggplot(aes(year, season_length, color = rcp, group = model)) + 
  geom_line(alpha = .3) + 
  facet_wrap(~rcp, ncol = 1) + 
  xlab("Year") + 
  ylab("Growing season length (days)") + 
  scale_color_manual(values = c("dodgerblue", "red")) + 
  theme(legend.position = "none")

Comparing climate in two time periods

Use the tibble object that is returned from cft_df() as an input to compare_periods() to compare climate between a reference and target period. You may specify the function with which to aggregate your chosen variable as well as the yearly time period months of the year to include in this calculation.

comps <- compare_periods(df,
                         var1 = "pr",
                         var2 = "tasmax",
                         agg_fun = "mean",
                         target_period = c(2025, 2030),
                         reference_period = c(2020, 2024),
                         months1 = 5:8,
                         months2 = 5:8,
                         scenarios = c("rcp45", "rcp85"))

This provides a data frame that can be used to compare the values in the target and reference period.

glimpse(comps)

One useful plot shows the difference in the two variables between reference and target periods:

title <-  paste("Change from the historical vs. reference period:", 
                comps$reference_period, comps$target_period, sep= "  vs  " )[1]

comps %>%
  dplyr::select(parameter, rcp, model, reference_period, target_period, difference) %>%
  pivot_wider(names_from = parameter, values_from = difference) %>%
  ungroup %>%
  mutate(rcp = ifelse(rcp == "rcp45", "RCP 4.5", "RCP 8.5")) %>%
  ggplot(aes(pr, tasmax, color = rcp)) + 
  ggtitle(title) +
  geom_point() + 
  geom_hline(yintercept = 0, alpha = .2) + 
  geom_vline(xintercept = 0, alpha = .2) +
  geom_text_repel(aes(label = model), segment.size = .3, size = 3) + 
  xlab("Difference in mean daily precipitation (mm)") + 
  ylab("Difference in mean daily max. temperature (C)") + 
  scale_color_manual(values = c("dodgerblue", "red"), 
                     "Greenhouse gas\ntrajectory") 

So, nearly all model runs indicate warming, but the amount of warming varies by model and emissions scenario. Precipitation increases and decreases are predicted by different models.

Why write the cft package?

The amount of data generated by downscaled GCMs can be quite large (e.g., daily data at a few km spatial resolution). The Climate Futures Toolbox was developed to help users access and use smaller subsets.

Data is acquired from the Northwest Knowledge Server of the University of Idaho.



earthlab/cft documentation built on Oct. 10, 2022, 8:30 p.m.