Welcome to the Climate Futures Toolbox

This vignette provides a walk-through of a common use case of the cft package: understanding climate futures for a region of interest. We'll use Yellowstone National Park, located in Wyoming, USA as a case study.

What you'll learn

This vignette will show you how to:

What you'll need

To get the most out of this vignette, we assume you have:

About the data

Global Circulation Models (GCMs) provide estimates of historical and future climate conditions. The complexity of the climate system has lead to a large number GCMs and it is common practice to examine outputs from many different models, treating each as one plausible future.

Most GCMs are spatially coarse (often 1 degree), but downscaling provides finer scale estimates. The cft package uses one downscaled climate model called MACA (Multivariate Adaptive Climate Analog) Version 2 (details here).

Acquiring and subsetting data within National Park Service boundaries

This package was originally written with the National Park Service in mind, so it has the option to use the name of any park (or monument, preserve, etc.) within the NPS. Use the cftdata() function to specify a range of years, a set of models, a set of parameters, and a set of representative concentration pathways to return. Leaving these arguments empty will results in a download of all available data for that location.

Loading the cft package from github

library(devtools)
install_github("earthlab/cft")

Attach cft and check the list of available functions

library(cft)
ls(pos="package:cft")

Look at the documentation for those functions

?available_data

Use read-only mode to find available data without initiating a full download.

start_time <- Sys.time()
inputs <- cft::available_data()
end_time <- Sys.time()

end_time - start_time

levels(as.factor(inputs$variable_names$Variable))
levels(as.factor(inputs$variable_names$`Variable abbreviation`))

levels(as.factor(inputs$variable_names$Scenario))
levels(as.factor(inputs$variable_names$`Scenario abbreviation`))

levels(as.factor(inputs$variable_names$Scenario))
levels(as.factor(inputs$variable_names$`Scenario abbreviation`))

levels(as.factor(inputs$variable_names$Model))
levels(as.factor(inputs$variable_names$`Model abbreviation`))

Filter the results from available_data() to specify which data to actually download.

Downloads from the API are limited to 500mb per request. A request for a large area of interest combined with a long time series will return a cryptic runtime error informing you that your request was too large.

This error may look like this: "CURL Error: Transferred a partial file Error in Rsx_nc4_get_vara_int: NetCDF: DAP failure Var: pr_CCSM4_r6i1p1_rcp85 Ndims: 3 Start: 0,444,511 Count: 34333,2,3 Error in ncvar_get_inner(ncid2use, varid2use, nc$var[[li]]$missval, addOffset, : C function Rsx_nc4_get_var_int returned error"

The solution to this problem is to subset your requests to make them fit within the boundaries of the API. You can achieve this by balancing the size of your requested spatial extent and the length of your requested time period. For a small national park, it's possible to pull the entire time series from the API but larger parks will require you to request shorter time window or stitch multiple time windows together.

Filter variable names

input_variables <- inputs$variable_names %>% 
  filter(Variable %in% c("Maximum Relative Humidity", 
                       "Maximum Temperature", 
                       "Minimum Relative Humidity",          
                       "Minimum Temperature",                 
                       "Northward Wind", 
                       "Precipitation")) %>% 
  filter(Scenario %in% c( "RCP 4.5", "RCP 8.5")) %>% 
  filter(Model %in% c(
    "Beijing Climate Center - Climate System Model 1.1",
    "Beijing Normal University - Earth System Model",
    "Canadian Earth System Model 2",                                                                
  "Centre National de Recherches Météorologiques - Climate Model 5",                              
  "Commonwealth Scientific and Industrial Research Organisation - Mk3.6.0",                       
  "Community Climate System Model 4",                                                             
  "Geophysical Fluid Dynamics Laboratory - Earth System Model 2 Generalized Ocean Layer Dynamics",
  "Geophysical Fluid Dynamics Laboratory - Earth System Model 2 Modular Ocean",                   
  "Hadley Global Environment Model 2 - Climate Chemistry 365 (day) ",                             
 "Hadley Global Environment Model 2 - Earth System 365 (day)",                                   
 "Institut Pierre Simon Laplace (IPSL) - Climate Model 5A - Low Resolution",                     
 "Institut Pierre Simon Laplace (IPSL) - Climate Model 5A - Medium Resolution",                  
 "Institut Pierre Simon Laplace (IPSL) - Climate Model 5B - Low Resolution",                     
 "Institute of Numerical Mathematics Climate Model 4",                                           
 "Meteorological Research Institute - Coupled Global Climate Model 3",                           
 "Model for Interdisciplinary Research On Climate - Earth System Model",                         
 "Model for Interdisciplinary Research On Climate - Earth System Model - Chemistry",             
 "Model for Interdisciplinary Research On Climate 5",                                            
 "Norwegian Earth System Model 1 - Medium Resolution"  )) %>%

  pull("Available variable")

input_variables

Establish area of interst (AOI) by bounding box

bb <- getbb("hot springs")
my_boundary <- opq(bb) %>% 
  add_osm_feature(key = "boundary", value = "national_park") %>% 
osmdata_sf() 

my_boundary
boundaries <- my_boundary$osm_multipolygons
pulled_bb <- st_bbox(boundaries)
pulled_bb
basemap <- ggplot(data = boundaries) +
  geom_sf(fill = "cornflowerblue") +
  geom_sf_text(aes(label = boundaries$name)) 

basemap

Download full time series from a single point

center_point <- st_centroid(boundaries) %>% st_bbox(center_point)

start_time <- Sys.time()

Pulled_data <- inputs$src %>% 
  hyper_filter(lat = lat <= c(center_point[4]+0.05) & lat >= c(center_point[2]-0.05)) %>% 
  hyper_filter(lon = lon <= c(center_point[3]+0.05) & lon >= c(center_point[1]-0.05)) %>% 
  hyper_tibble(select_var = input_variables
    ) %>%
  st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant")



end_time <- Sys.time()
end_time - start_time


head(Pulled_data)
tail(Pulled_data)
ggplot() +
  geom_sf(data = boundaries, fill = "cornflowerblue") +
 geom_sf(data = st_centroid(boundaries), color = "red", size=0.5) +
  coord_sf(crs = 4326) 

Filter time

# Year 2034
time_min <- 38716
time_max <- 73048

input_times <- inputs$available_times %>% 
  add_column(index = 0) %>% 
  add_column(first_half = 0) %>% 
  add_column(second_half = 0) 
input_times[which(inputs$available_times[,1] >= time_min & inputs$available_times[,1] <= time_max ),3] <- 1


med <- median(row_number(input_times[,3])) 
input_times[which(as.numeric(row.names(input_times)) <= med),4] <- 1
input_times[which(as.numeric(row.names(input_times)) > med),5] <- 1



head(input_times)
tail(input_times)

Pull and stitch

start_time <- Sys.time()


Pulled_data_sub1 <- Pulled_data <- inputs$src %>% 
  hyper_filter(lat = lat <= c(center_point[4]+0.05) & lat >= c(center_point[2]-0.05)) %>% 
  hyper_filter(lon = lon <= c(center_point[3]+0.05) & lon >= c(center_point[1]-0.05)) %>% 
  hyper_filter(time =  input_times[,4] == 1) %>% 
  hyper_tibble(select_var = input_variables
    )


Pulled_data_sub1 <- Pulled_data <- inputs$src %>% 
  hyper_filter(lat = lat <= c(center_point[4]+0.05) & lat >= c(center_point[2]-0.05)) %>% 
  hyper_filter(lon = lon <= c(center_point[3]+0.05) & lon >= c(center_point[1]-0.05)) %>% 
  hyper_filter(time =  input_times[,5] == 1) %>% 
  hyper_tibble(select_var = input_variables
    )


end_time <- Sys.time()
end_time - start_time



tail(Pulled_data_sub1)
tail(Pulled_data_sub2)

Download data by AOI, filtered times, and filtered variable list

start_time <- Sys.time()

Pulled_data <- inputs$src %>% 
  hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% 
  hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% 
  hyper_tibble(select_var = input_variables
    ) %>%
  st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant")



end_time <- Sys.time()
end_time - start_time


head(Pulled_data)
tail(Pulled_data)

Time extent plot

Check this plot to make sure you downloaded your entire time series.

plot(Pulled_data$time, Pulled_data$`pr_HadGEM2-ES365_r1i1p1_rcp85`)

Spatial extent plot

Check here to make sure you downloaded the proper spatial extent.

check_filter <- Pulled_data %>% filter(time == min(Pulled_data$time))

ggplot() +
  geom_sf(data = boundaries, fill = "cornflowerblue") +
 geom_sf(data = check_filter, color = "red", size=0.5) +
  coord_sf(crs = 4326) 

If you encounter an error suggesting that you are pulling too much data, you will need to stitch a few requests together to keep each call below the 500mb limit.

bb <- getbb("yellowstone")
bb_manual <- bb
bb_manual[1,1] <- -111.15594815937659
bb_manual[1,2] <- -109.8305463801207
bb_manual[2,1] <- 44.12354048271325
bb_manual[2,2] <- 45.11911641599412

my_boundary <- opq(bb_manual) %>% 
  add_osm_feature(key = "boundary", value = "national_park") %>% 
osmdata_sf() 

my_boundary
boundaries <- my_boundary$osm_multipolygons
pulled_bb <- st_bbox(boundaries)
pulled_bb
start_time <- Sys.time()


Pulled_data_sub1 <- inputs$src %>% 
  hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% 
  hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% 
  hyper_filter(time =  input_times[,4] == 1) %>% 
  hyper_tibble(select_var = input_variables
    ) 
#%>%
#  st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant")
## should time be in here?


Pulled_data_sub2 <- inputs$src %>% 
  hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% 
  hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% 
  hyper_filter(time =  input_times[,5] == 1) %>% 
  hyper_tibble(select_var = input_variables
    ) #%>%
 # st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant")
## should time be in here?

end_time <- Sys.time()
end_time - start_time



tail(Pulled_data_sub1)
tail(Pulled_data_sub2)
start_time <- Sys.time()

Pulled_data_stitch <- bind_rows(Pulled_data_sub1, Pulled_data_sub2)

end_time <- Sys.time()
end_time - start_time

Time extent plot

Check this plot to make sure you downloaded your entire time series.

plot(Pulled_data_stitch$time, Pulled_data_stitch$`pr_HadGEM2-ES365_r1i1p1_rcp85`)

Spatial extent plot

Check here to make sure you downloaded the proper spatial extent.

check_filter <- Pulled_data_stitch %>% filter(time == min(Pulled_data_stitch$time))

ggplot() +
  geom_sf(data = boundaries, fill = "cornflowerblue") +
 geom_sf(data = check_filter, color = "red", size=0.5) +
  coord_sf(crs = 4326) 

Melt downloaded points into a raster before aggretation

rast <- st_rasterize(Pulled_data) 
plot(rast)

#Pulled_data %>% as.data.frame() %>% brick()

GridMET data

param_meta$gridmet
subed_times <- input_times %>% filter(index == 1) 
GM <- getGridMET(st_as_sfc(st_bbox(boundaries)), "tmax", startDate = "1997-04-06", endDate = "1999-12-30")
SM_stars <- GM$gridmet_tmax %>% brick() %>%  st_as_stars()
#st_set_dimensions(SM_stars, 3, values = X1997.04.06, names = "tmax")

ggplot() +
  geom_sf(data = SM_stars, fill = "cornflowerblue") +
 geom_sf(data = check_filter, color = "red", size=0.5) +
  coord_sf(crs = 4326) 
st_get_dimension_values(rast)
#combo <- st_extract(rast, SM_stars) %>% st_as_sf()
combo <- c(SM_stars,rast , along_crs=TRUE, along=c(1,2))
class(combo) 
class(SM_stars)
plot(combo$X2000.01.01)
plot(combo$X2000.01.01)

#extracted_GridMET <- st_extract(rast, SM_stars) %>% st_as_sf()
ggplot(data=combo) +
  geom_sf(aes(fill = X2000.01.01)) +
  scale_fill_continuous(low="thistle2", high="darkred", 
                       guide="colorbar",na.value="white")+
  coord_sf(crs = 4326)

Aggregate downloaded data to different spatial objects

Aggregate to polygon (faster method)

extracted <- st_extract(rast, boundaries$geometry) %>% st_as_sf()
names(extracted)[1] <- "nn"
ggplot(data=extracted) +
  geom_sf(aes(fill = nn)) +
  scale_fill_continuous(low="thistle2", high="darkred", 
                       guide="colorbar",na.value="white")+
  coord_sf(crs = 4326)
cube <- src_slc %>% hyper_tbl_cube(select_var = c("pr_HadGEM2-ES365_r1i1p1_rcp85"))
cube

Clip raster with polygon (slower method)

small_pulled <- Pulled_data %>%
  filter(time == 72049)
intersection <- st_intersection(x = small_pulled, y = boundaries$geometry)

names(intersection)[1:2] <- c("Precipitation","b")
library(ggthemes)
ggplot() +
  geom_sf(data = intersection, aes(color=Precipitation)) +
  scale_color_continuous(low="thistle2", high="darkred", 
                       guide="colorbar",na.value="white")+
  geom_sf(data = boundaries, fill = NA, color = "white") +
  theme_tufte()+
  labs(title = "YELLOWSTONE NATIONAL PARK", subtitle = "Temperture in 2050")

Aggregate to River segment

river <- opq(bb_manual) %>%
  add_osm_feature(key = "waterway", value = "river") %>%
  osmdata_sf() 
river
river_sub <- st_buffer(river$osm_lines, 2200)
extracted_river <- st_extract(rast,  river_sub$geometry ) %>% st_as_sf()
head(extracted_river)
#colnames(extracted_river)[1] <- "var1"
ggplot(data=extracted_river) +
  geom_sf(aes(fill = pr_HadGEM2.ES365_r1i1p1_rcp85), size=0) +
   coord_sf(crs = 4326, xlim = c(pulled_bb[1], pulled_bb[3]), 
           ylim = c(pulled_bb[2], pulled_bb[4]),
           expand = FALSE) +
  scale_fill_continuous(low="thistle2", high="darkred", 
                       guide="colorbar",na.value="white")+
  labs(title = "Rivers of Yellowstone",
       subtitle = "Projected humidity in 2040", 
       caption = "Data Source: Climate Futures...") + 
  theme_tufte()

Aggregate to road segment

roads <- opq(bb_manual) %>%
  add_osm_feature(key = 'highway', value = 'primary') %>%
  add_osm_feature(key = 'highway', value = 'secondary') %>%
  osmdata_sf() 
roads_sub <- st_buffer(roads$osm_lines, 2200)
extracted_roads <- st_extract(rast,  roads_sub$geometry ) %>% st_as_sf()
extracted_roads
ggplot(data=extracted_roads) +
  geom_sf(aes(fill = pr_HadGEM2.ES365_r1i1p1_rcp85), size=0) +
   coord_sf(crs = 4326) +
  scale_fill_continuous(low="thistle2", high="darkred", 
                       guide="colorbar",na.value="white")+
  labs(title = "Roads of Yellowstone",
       subtitle = "Projected humidity in 2040", 
       caption = "Data Source: Climate Futures...") + 
  theme_tufte()

Computing new daily climate variables

Now that we have all of the climate parameters for our study region, we can compute functions of those variables. For example, it is common to compute the midpoint of the maximum and minimum daily temperature, which we can do using the mutate function:

df <- df %>%
  mutate(tasmid = (tasmax + tasmin) / 2)

Now we have a new column called tasmid that is the midpoint of the maximum and minumum daily temperature!

Wind speed provides another example of a derived parameter that can be computed for each day. By default, we have two wind-related parameters: the eastward wind component (called uas) and the northward wind component (called vas), both in units of meters per second (you can get this information from cft::argument_reference). Wind speed can be computed from vas and uas using the Pythagorean theorem:

$\text{Wind speed} = \sqrt{v_{as}^2 + u_{as}^2}.$

In code:

df <- df %>%
  mutate(wind_speed = sqrt(vas^2 + uas^2))

Computing new climate variable summaries

Sometimes, there are new climate variables that summarize daily data. For example, you may want to compute:

All of these quantities summarize daily data, and require some aggregation time interval which in many cases will be one year. As an example, we will compute the growing season length for Wind Cave National Park across all models and emissions scenarios. To do this, we first need to define a new column for year, which we will use as a grouping variable:

df <- df %>%
  mutate(year = year(date))

Now, we want to compute growing season length for each year, model, emissions scenario combination.

growing_seasons <- df %>%
  group_by(rcp, model, year, ensemble) %>%
  summarize(season_length = sum(tasmid > 273.15)) %>%
  ungroup

Notice that we used our derived temperature midpoint column tasmid, and computed the total (sum()) number of days for each group where the temperature midpoint was greater than 0 C (or, 273.15 Kelvin, which are the units of the temperature data).

growing_seasons

Let's visualize the growing season over time for each model and emission scenario:

growing_seasons %>%
  ggplot(aes(year, season_length, color = rcp, group = model)) + 
  geom_line(alpha = .3) + 
  facet_wrap(~rcp, ncol = 1) + 
  xlab("Year") + 
  ylab("Growing season length (days)") + 
  scale_color_manual(values = c("dodgerblue", "red")) + 
  theme(legend.position = "none")

Comparing climate in two time periods

Use the tibble object that is returned from cft_df() as an input to compare_periods() to compare climate between a reference and target period. You may specify the function with which to aggregate your chosen variable as well as the yearly time period months of the year to include in this calculation.

comps <- compare_periods(df,
                         var1 = "pr",
                         var2 = "tasmax",
                         agg_fun = "mean",
                         target_period = c(2025, 2030),
                         reference_period = c(2020, 2024),
                         months1 = 5:8,
                         months2 = 5:8,
                         scenarios = c("rcp45", "rcp85"))

This provides a data frame that can be used to compare the values in the target and reference period.

glimpse(comps)

One useful plot shows the difference in the two variables between reference and target periods:

title <-  paste("Change from the historical vs. reference period:", 
                comps$reference_period, comps$target_period, sep= "  vs  " )[1]

comps %>%
  dplyr::select(parameter, rcp, model, reference_period, target_period, difference) %>%
  pivot_wider(names_from = parameter, values_from = difference) %>%
  ungroup %>%
  mutate(rcp = ifelse(rcp == "rcp45", "RCP 4.5", "RCP 8.5")) %>%
  ggplot(aes(pr, tasmax, color = rcp)) + 
  ggtitle(title) +
  geom_point() + 
  geom_hline(yintercept = 0, alpha = .2) + 
  geom_vline(xintercept = 0, alpha = .2) +
  geom_text_repel(aes(label = model), segment.size = .3, size = 3) + 
  xlab("Difference in mean daily precipitation (mm)") + 
  ylab("Difference in mean daily max. temperature (C)") + 
  scale_color_manual(values = c("dodgerblue", "red"), 
                     "Greenhouse gas\ntrajectory") 

So, nearly all model runs indicate warming, but the amount of warming varies by model and emissions scenario. Precipitation increases and decreases are predicted by different models.

Why write the cft package?

The amount of data generated by downscaled GCMs can be quite large (e.g., daily data at a few km spatial resolution). The Climate Futures Toolbox was developed to help users access and use smaller subsets.

Data is acquired from the Northwest Knowledge Server of the University of Idaho.



earthlab/cft documentation built on Oct. 10, 2022, 8:30 p.m.