This vignette provides a walk-through of a common use case of the cft package: understanding climate futures for a region of interest. We'll use Yellowstone National Park, located in Wyoming, USA as a case study.
This vignette will show you how to:
data.frame
containing climate dataTo get the most out of this vignette, we assume you have:
filter()
, group_by()
, and summarize()
)Global Circulation Models (GCMs) provide estimates of historical and future climate conditions. The complexity of the climate system has lead to a large number GCMs and it is common practice to examine outputs from many different models, treating each as one plausible future.
Most GCMs are spatially coarse (often 1 degree), but downscaling provides finer scale estimates. The cft package uses one downscaled climate model called MACA (Multivariate Adaptive Climate Analog) Version 2 (details here).
This package was originally written with the National Park Service in mind, so
it has the option to use the name of any park (or monument, preserve, etc.) within
the NPS. Use the cftdata()
function to specify a range of years, a set of models,
a set of parameters, and a set of representative concentration pathways to return.
Leaving these arguments empty will results in a download of all available data
for that location.
library(devtools) install_github("earthlab/cft")
library(cft) ls(pos="package:cft")
?available_data
start_time <- Sys.time() inputs <- cft::available_data() end_time <- Sys.time() end_time - start_time levels(as.factor(inputs$variable_names$Variable)) levels(as.factor(inputs$variable_names$`Variable abbreviation`)) levels(as.factor(inputs$variable_names$Scenario)) levels(as.factor(inputs$variable_names$`Scenario abbreviation`)) levels(as.factor(inputs$variable_names$Scenario)) levels(as.factor(inputs$variable_names$`Scenario abbreviation`)) levels(as.factor(inputs$variable_names$Model)) levels(as.factor(inputs$variable_names$`Model abbreviation`))
Downloads from the API are limited to 500mb per request. A request for a large area of interest combined with a long time series will return a cryptic runtime error informing you that your request was too large.
This error may look like this: "CURL Error: Transferred a partial file Error in Rsx_nc4_get_vara_int: NetCDF: DAP failure Var: pr_CCSM4_r6i1p1_rcp85 Ndims: 3 Start: 0,444,511 Count: 34333,2,3 Error in ncvar_get_inner(ncid2use, varid2use, nc$var[[li]]$missval, addOffset, : C function Rsx_nc4_get_var_int returned error"
The solution to this problem is to subset your requests to make them fit within the boundaries of the API. You can achieve this by balancing the size of your requested spatial extent and the length of your requested time period. For a small national park, it's possible to pull the entire time series from the API but larger parks will require you to request shorter time window or stitch multiple time windows together.
input_variables <- inputs$variable_names %>% filter(Variable %in% c("Maximum Relative Humidity", "Maximum Temperature", "Minimum Relative Humidity", "Minimum Temperature", "Northward Wind", "Precipitation")) %>% filter(Scenario %in% c( "RCP 4.5", "RCP 8.5")) %>% filter(Model %in% c( "Beijing Climate Center - Climate System Model 1.1", "Beijing Normal University - Earth System Model", "Canadian Earth System Model 2", "Centre National de Recherches Météorologiques - Climate Model 5", "Commonwealth Scientific and Industrial Research Organisation - Mk3.6.0", "Community Climate System Model 4", "Geophysical Fluid Dynamics Laboratory - Earth System Model 2 Generalized Ocean Layer Dynamics", "Geophysical Fluid Dynamics Laboratory - Earth System Model 2 Modular Ocean", "Hadley Global Environment Model 2 - Climate Chemistry 365 (day) ", "Hadley Global Environment Model 2 - Earth System 365 (day)", "Institut Pierre Simon Laplace (IPSL) - Climate Model 5A - Low Resolution", "Institut Pierre Simon Laplace (IPSL) - Climate Model 5A - Medium Resolution", "Institut Pierre Simon Laplace (IPSL) - Climate Model 5B - Low Resolution", "Institute of Numerical Mathematics Climate Model 4", "Meteorological Research Institute - Coupled Global Climate Model 3", "Model for Interdisciplinary Research On Climate - Earth System Model", "Model for Interdisciplinary Research On Climate - Earth System Model - Chemistry", "Model for Interdisciplinary Research On Climate 5", "Norwegian Earth System Model 1 - Medium Resolution" )) %>% pull("Available variable") input_variables
bb <- getbb("hot springs") my_boundary <- opq(bb) %>% add_osm_feature(key = "boundary", value = "national_park") %>% osmdata_sf() my_boundary
boundaries <- my_boundary$osm_multipolygons pulled_bb <- st_bbox(boundaries) pulled_bb
basemap <- ggplot(data = boundaries) + geom_sf(fill = "cornflowerblue") + geom_sf_text(aes(label = boundaries$name)) basemap
center_point <- st_centroid(boundaries) %>% st_bbox(center_point) start_time <- Sys.time() Pulled_data <- inputs$src %>% hyper_filter(lat = lat <= c(center_point[4]+0.05) & lat >= c(center_point[2]-0.05)) %>% hyper_filter(lon = lon <= c(center_point[3]+0.05) & lon >= c(center_point[1]-0.05)) %>% hyper_tibble(select_var = input_variables ) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") end_time <- Sys.time() end_time - start_time head(Pulled_data) tail(Pulled_data)
ggplot() + geom_sf(data = boundaries, fill = "cornflowerblue") + geom_sf(data = st_centroid(boundaries), color = "red", size=0.5) + coord_sf(crs = 4326)
# Year 2034 time_min <- 38716 time_max <- 73048 input_times <- inputs$available_times %>% add_column(index = 0) %>% add_column(first_half = 0) %>% add_column(second_half = 0) input_times[which(inputs$available_times[,1] >= time_min & inputs$available_times[,1] <= time_max ),3] <- 1 med <- median(row_number(input_times[,3])) input_times[which(as.numeric(row.names(input_times)) <= med),4] <- 1 input_times[which(as.numeric(row.names(input_times)) > med),5] <- 1 head(input_times) tail(input_times)
start_time <- Sys.time() Pulled_data_sub1 <- Pulled_data <- inputs$src %>% hyper_filter(lat = lat <= c(center_point[4]+0.05) & lat >= c(center_point[2]-0.05)) %>% hyper_filter(lon = lon <= c(center_point[3]+0.05) & lon >= c(center_point[1]-0.05)) %>% hyper_filter(time = input_times[,4] == 1) %>% hyper_tibble(select_var = input_variables ) Pulled_data_sub1 <- Pulled_data <- inputs$src %>% hyper_filter(lat = lat <= c(center_point[4]+0.05) & lat >= c(center_point[2]-0.05)) %>% hyper_filter(lon = lon <= c(center_point[3]+0.05) & lon >= c(center_point[1]-0.05)) %>% hyper_filter(time = input_times[,5] == 1) %>% hyper_tibble(select_var = input_variables ) end_time <- Sys.time() end_time - start_time tail(Pulled_data_sub1) tail(Pulled_data_sub2)
start_time <- Sys.time() Pulled_data <- inputs$src %>% hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% hyper_tibble(select_var = input_variables ) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") end_time <- Sys.time() end_time - start_time head(Pulled_data) tail(Pulled_data)
Check this plot to make sure you downloaded your entire time series.
plot(Pulled_data$time, Pulled_data$`pr_HadGEM2-ES365_r1i1p1_rcp85`)
Check here to make sure you downloaded the proper spatial extent.
check_filter <- Pulled_data %>% filter(time == min(Pulled_data$time)) ggplot() + geom_sf(data = boundaries, fill = "cornflowerblue") + geom_sf(data = check_filter, color = "red", size=0.5) + coord_sf(crs = 4326)
bb <- getbb("yellowstone") bb_manual <- bb bb_manual[1,1] <- -111.15594815937659 bb_manual[1,2] <- -109.8305463801207 bb_manual[2,1] <- 44.12354048271325 bb_manual[2,2] <- 45.11911641599412 my_boundary <- opq(bb_manual) %>% add_osm_feature(key = "boundary", value = "national_park") %>% osmdata_sf() my_boundary
boundaries <- my_boundary$osm_multipolygons pulled_bb <- st_bbox(boundaries) pulled_bb
start_time <- Sys.time() Pulled_data_sub1 <- inputs$src %>% hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% hyper_filter(time = input_times[,4] == 1) %>% hyper_tibble(select_var = input_variables ) #%>% # st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") ## should time be in here? Pulled_data_sub2 <- inputs$src %>% hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% hyper_filter(time = input_times[,5] == 1) %>% hyper_tibble(select_var = input_variables ) #%>% # st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") ## should time be in here? end_time <- Sys.time() end_time - start_time tail(Pulled_data_sub1) tail(Pulled_data_sub2)
start_time <- Sys.time() Pulled_data_stitch <- bind_rows(Pulled_data_sub1, Pulled_data_sub2) end_time <- Sys.time() end_time - start_time
Check this plot to make sure you downloaded your entire time series.
plot(Pulled_data_stitch$time, Pulled_data_stitch$`pr_HadGEM2-ES365_r1i1p1_rcp85`)
Check here to make sure you downloaded the proper spatial extent.
check_filter <- Pulled_data_stitch %>% filter(time == min(Pulled_data_stitch$time)) ggplot() + geom_sf(data = boundaries, fill = "cornflowerblue") + geom_sf(data = check_filter, color = "red", size=0.5) + coord_sf(crs = 4326)
rast <- st_rasterize(Pulled_data) plot(rast) #Pulled_data %>% as.data.frame() %>% brick()
param_meta$gridmet
subed_times <- input_times %>% filter(index == 1) GM <- getGridMET(st_as_sfc(st_bbox(boundaries)), "tmax", startDate = "1997-04-06", endDate = "1999-12-30") SM_stars <- GM$gridmet_tmax %>% brick() %>% st_as_stars() #st_set_dimensions(SM_stars, 3, values = X1997.04.06, names = "tmax") ggplot() + geom_sf(data = SM_stars, fill = "cornflowerblue") + geom_sf(data = check_filter, color = "red", size=0.5) + coord_sf(crs = 4326)
st_get_dimension_values(rast) #combo <- st_extract(rast, SM_stars) %>% st_as_sf() combo <- c(SM_stars,rast , along_crs=TRUE, along=c(1,2)) class(combo) class(SM_stars) plot(combo$X2000.01.01)
plot(combo$X2000.01.01) #extracted_GridMET <- st_extract(rast, SM_stars) %>% st_as_sf() ggplot(data=combo) + geom_sf(aes(fill = X2000.01.01)) + scale_fill_continuous(low="thistle2", high="darkred", guide="colorbar",na.value="white")+ coord_sf(crs = 4326)
extracted <- st_extract(rast, boundaries$geometry) %>% st_as_sf() names(extracted)[1] <- "nn" ggplot(data=extracted) + geom_sf(aes(fill = nn)) + scale_fill_continuous(low="thistle2", high="darkred", guide="colorbar",na.value="white")+ coord_sf(crs = 4326)
cube <- src_slc %>% hyper_tbl_cube(select_var = c("pr_HadGEM2-ES365_r1i1p1_rcp85")) cube
small_pulled <- Pulled_data %>% filter(time == 72049) intersection <- st_intersection(x = small_pulled, y = boundaries$geometry) names(intersection)[1:2] <- c("Precipitation","b")
library(ggthemes) ggplot() + geom_sf(data = intersection, aes(color=Precipitation)) + scale_color_continuous(low="thistle2", high="darkred", guide="colorbar",na.value="white")+ geom_sf(data = boundaries, fill = NA, color = "white") + theme_tufte()+ labs(title = "YELLOWSTONE NATIONAL PARK", subtitle = "Temperture in 2050")
river <- opq(bb_manual) %>% add_osm_feature(key = "waterway", value = "river") %>% osmdata_sf() river
river_sub <- st_buffer(river$osm_lines, 2200) extracted_river <- st_extract(rast, river_sub$geometry ) %>% st_as_sf() head(extracted_river) #colnames(extracted_river)[1] <- "var1"
ggplot(data=extracted_river) + geom_sf(aes(fill = pr_HadGEM2.ES365_r1i1p1_rcp85), size=0) + coord_sf(crs = 4326, xlim = c(pulled_bb[1], pulled_bb[3]), ylim = c(pulled_bb[2], pulled_bb[4]), expand = FALSE) + scale_fill_continuous(low="thistle2", high="darkred", guide="colorbar",na.value="white")+ labs(title = "Rivers of Yellowstone", subtitle = "Projected humidity in 2040", caption = "Data Source: Climate Futures...") + theme_tufte()
roads <- opq(bb_manual) %>% add_osm_feature(key = 'highway', value = 'primary') %>% add_osm_feature(key = 'highway', value = 'secondary') %>% osmdata_sf() roads_sub <- st_buffer(roads$osm_lines, 2200) extracted_roads <- st_extract(rast, roads_sub$geometry ) %>% st_as_sf() extracted_roads
ggplot(data=extracted_roads) + geom_sf(aes(fill = pr_HadGEM2.ES365_r1i1p1_rcp85), size=0) + coord_sf(crs = 4326) + scale_fill_continuous(low="thistle2", high="darkred", guide="colorbar",na.value="white")+ labs(title = "Roads of Yellowstone", subtitle = "Projected humidity in 2040", caption = "Data Source: Climate Futures...") + theme_tufte()
Now that we have all of the climate parameters for our study region, we can
compute functions of those variables.
For example, it is common to compute the midpoint of the maximum and minimum
daily temperature, which we can do using the mutate
function:
df <- df %>% mutate(tasmid = (tasmax + tasmin) / 2)
Now we have a new column called tasmid
that is the midpoint of the maximum
and minumum daily temperature!
Wind speed provides another example of a derived parameter that can be computed
for each day.
By default, we have two wind-related parameters: the eastward wind component
(called uas
) and the northward wind component (called vas
), both in units of
meters per second (you can get this information from cft::argument_reference
).
Wind speed can be computed from vas
and uas
using the Pythagorean theorem:
$\text{Wind speed} = \sqrt{v_{as}^2 + u_{as}^2}.$
In code:
df <- df %>% mutate(wind_speed = sqrt(vas^2 + uas^2))
Sometimes, there are new climate variables that summarize daily data. For example, you may want to compute:
All of these quantities summarize daily data, and require some aggregation time interval which in many cases will be one year. As an example, we will compute the growing season length for Wind Cave National Park across all models and emissions scenarios. To do this, we first need to define a new column for year, which we will use as a grouping variable:
df <- df %>% mutate(year = year(date))
Now, we want to compute growing season length for each year, model, emissions scenario combination.
growing_seasons <- df %>% group_by(rcp, model, year, ensemble) %>% summarize(season_length = sum(tasmid > 273.15)) %>% ungroup
Notice that we used our derived temperature midpoint column tasmid
, and computed the total (sum()
) number of days for each group where the temperature midpoint was greater than 0 C (or, 273.15 Kelvin, which are the units of the temperature data).
growing_seasons
Let's visualize the growing season over time for each model and emission scenario:
growing_seasons %>% ggplot(aes(year, season_length, color = rcp, group = model)) + geom_line(alpha = .3) + facet_wrap(~rcp, ncol = 1) + xlab("Year") + ylab("Growing season length (days)") + scale_color_manual(values = c("dodgerblue", "red")) + theme(legend.position = "none")
Use the tibble object that is returned from cft_df()
as an input to
compare_periods()
to compare climate between a reference and target period. You
may specify the function with which to aggregate your chosen variable as well
as the yearly time period months of the year to include in this calculation.
comps <- compare_periods(df, var1 = "pr", var2 = "tasmax", agg_fun = "mean", target_period = c(2025, 2030), reference_period = c(2020, 2024), months1 = 5:8, months2 = 5:8, scenarios = c("rcp45", "rcp85"))
This provides a data frame that can be used to compare the values in the target and reference period.
glimpse(comps)
One useful plot shows the difference in the two variables between reference and target periods:
title <- paste("Change from the historical vs. reference period:", comps$reference_period, comps$target_period, sep= " vs " )[1] comps %>% dplyr::select(parameter, rcp, model, reference_period, target_period, difference) %>% pivot_wider(names_from = parameter, values_from = difference) %>% ungroup %>% mutate(rcp = ifelse(rcp == "rcp45", "RCP 4.5", "RCP 8.5")) %>% ggplot(aes(pr, tasmax, color = rcp)) + ggtitle(title) + geom_point() + geom_hline(yintercept = 0, alpha = .2) + geom_vline(xintercept = 0, alpha = .2) + geom_text_repel(aes(label = model), segment.size = .3, size = 3) + xlab("Difference in mean daily precipitation (mm)") + ylab("Difference in mean daily max. temperature (C)") + scale_color_manual(values = c("dodgerblue", "red"), "Greenhouse gas\ntrajectory")
So, nearly all model runs indicate warming, but the amount of warming varies by model and emissions scenario. Precipitation increases and decreases are predicted by different models.
The amount of data generated by downscaled GCMs can be quite large (e.g., daily data at a few km spatial resolution). The Climate Futures Toolbox was developed to help users access and use smaller subsets.
Data is acquired from the Northwest Knowledge Server of the University of Idaho.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.