This vignette provides a walk-through of a common use case of the cft package: understanding climate futures for a region of interest. We'll use Wind Cave National Park, located in South Dakota, USA as a case study.
This vignette will show you how to:
data.frame
containing climate dataTo get the most out of this vignette, we assume you have:
filter()
, group_by()
, and summarise()
)Global Circulation Models (GCMs) provide estimates of historical and future climate conditions. The complexity of the climate system has lead to a large number GCMs and it is common practice to examine outputs from many different models, treating each as one plausible future.
Most GCMs are spatially coarse (often 1 degree), but downscaling provides finer scale estimates. The cft package uses one downscaled climate model called MACA (Multivariate Adaptive Climate Analog) Version 2 (details here).
This package was originally written with the National Park Service in mind, so
it has the option to use the name of any park (or monument, preserve, etc.) within
the NPS. Use the cftdata()
function to specify a range of years, a set of models,
a set of parameters, and a set of representative concentration pathways to return.
Leaving these arguments empty will result in a download of all available data
for that location.
library(devtools) install_github("earthlab/cft")
library(cft) ls(pos="package:cft")
?available_data ?single_point_firehose
web_link = "https://cida.usgs.gov/thredds/dodsC/macav2metdata_daily_future" # Change to "https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/macav2metdata_daily_historical" for historical data. src <- tidync::tidync(web_link)
inputs <- cft::available_data()
The following code is used to create a filter which will select data from the year 2034. This filter creates an index which is given the value of 0 for times that are not in year 2034 and a value of 1 for times that are in the year 2034.
library(tibble) # Year 2034 time_min <- 72048 time_max <- 73048 input_times <- inputs$available_times %>% add_column(index = 0) input_times[which(inputs$available_times[,1] > time_min & inputs$available_times[,1] < time_max ),3] <- 1 tail(input_times)
The following code creates a filter to select precipitation and maximum temperature data from the emission scenarios RCP 4.5 and RCP 8.5 from the Norwegian Earth System Model, the Hadley Global Environment Model 2, and the Community Climate System Model 4. This is done by filtering by variable, by model, and by scenario using a separate call to the filter()
function for each filtering step.
input_variables <- inputs$variable_names %>% filter(Variable == c("Precipitation", "Maximum Temperature")) %>% filter(Model == c("Norwegian Earth System Model 1 - Medium Resolution", "Hadley Global Environment Model 2 - Earth System 365 (day)", "Community Climate System Model 4")) %>% filter(Scenario == c("RCP 4.5", "RCP 8.5")) %>% pull("Available variable") input_variables
<<<<<<< HEAD
This code obtains the bounding box from Open Street Map (OSM) for the national parks adjacent to Yellowstone National Park and then manually updates the corners of the bounding box for this region to match the official bounding box from the National Parks Service. This is done because the bounding box obtained from OSM is larger than the official bounding box from the National Parks Service so this adjustment makes sure that the bounding box for our area of interest matches our desired area of interest.
bb <- getbb("yellowstone") bb_manual <- bb bb_manual[1,1] <- -111.15594815937659 bb_manual[1,2] <- -109.8305463801207 bb_manual[2,1] <- 44.12354048271325 bb_manual[2,2] <- 45.11911641599412 my_boundary <- opq(bb_manual) %>% add_osm_feature(key = "boundary", value = "national_park") %>% osmdata_sf() my_boundary my_boundary$osm_multipolygons my_boundary$osm_multipolygons[1,] boundaries <- my_boundary$osm_multipolygons[1,]
pulled_bb <- st_bbox(boundaries) pulled_bb
This code produces a map of the area of interest which can be used to ensure that the bounding box that was previously defined for the data matches our desired area of interest.
basemap <- ggplot(data = boundaries) + geom_sf(fill = "cornflowerblue") + geom_sf_text(aes(label = boundaries$name), size=10) basemap
This code downloads precipitation data from the Beijing Normal University Earth System Model and the Hadley Global Environmental Model 2 for emission scenario RCP 8.5 for the year 2034 in the national parks that touch Yellowstone National Park. First, the hyper_filter()
function is used to select data from the national parks that touch Yellowstone National Park by filtering both the latitude and longitude coordinates to coordinates within a small delta of the minimum and maximum coordinates of the bounding box that was previously defined. Then the hyper_filter()
function is used to filter the data to data from the year 2034 using the time filter that was defined above. Finally, the hyper_tibble()
function is used to select the precipitation data from the Beijing Normal University Earth System Model and the Hadley Global Environmental Model 2 for emission scenario RCP 8.5 using the filter that was defined above.
Pulled_data <- inputs$src %>% hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% hyper_filter(time = input_times[,3] == 1) %>% hyper_tibble(select_var = input_variables[1:50] ) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") head(Pulled_data)
This plot shows the data that were downloaded as red dots for the earliest time in year 2034 over the map of the national parks that touch Yellowstone National Park. The data are shown for the earliest time in year 2034 because data were downloaded for multiple times so filtering to a single point in time shows the spatial distribution of the downloaded data without having to include each of the times for which data were downloaded.
check_filter <- Pulled_data %>% filter(time == min(Pulled_data$time)) ggplot() + geom_sf(data = boundaries, fill = "cornflowerblue") + geom_sf(data = check_filter, color = "red", size=5) + coord_sf(crs = 4326)
rast <- st_rasterize(Pulled_data) plot(rast) #Pulled_data %>% as.data.frame() %>% brick()
extracted <- st_extract(rast, boundaries$geometry) %>% st_as_sf() names(extracted)[1] <- "nn" ggplot(data=extracted) + geom_sf(aes(fill = nn)) + scale_fill_continuous(low="thistle2", high="darkred", guide="colorbar",na.value="white")+ coord_sf(crs = 4326)
small_pulled <- Pulled_data %>% filter(time == 72049) intersection <- st_intersection(x = small_pulled, y = boundaries$geometry) names(intersection)[1:2] <- c("Precipitation","b")
library(ggthemes) ggplot() + geom_sf(data = intersection, aes(color=Precipitation), size=3, shape=15) + scale_color_continuous(low="thistle2", high="darkred", guide="colorbar",na.value="white")+ geom_sf(data = boundaries, fill = NA, color = "white") + theme_tufte()+ labs(title = "ROCKY MOUNTAIN NATIONAL PARK", subtitle = "Precipitation in 2050")
river <- opq(pulled_bb) %>% add_osm_feature(key = "waterway", value = "river") %>% osmdata_sf() river
river_sub <- st_buffer(river$osm_lines, 5000) extracted_river <- st_extract(rast, river_sub$geometry ) %>% st_as_sf() head(extracted_river)
ggplot(data=extracted_river) + geom_sf(aes(fill = pr_HadGEM2.ES365_r1i1p1_rcp85), size=0) + coord_sf(crs = 4326, xlim = c(pulled_bb[1], pulled_bb[3]), ylim = c(pulled_bb[2], pulled_bb[4]), expand = FALSE) + scale_fill_continuous(low="thistle2", high="darkred", guide="colorbar",na.value="white")+ labs(title = "Rivers of Yellowstone", subtitle = "Projected humidity in 2040", caption = "Data Source: Climate Futures...") + theme_tufte()
roads <- opq(pulled_bb) %>% add_osm_feature(key = 'highway', value = 'primary') %>% add_osm_feature(key = 'highway', value = 'secondary') %>% osmdata_sf() roads_sub <- st_buffer(roads$osm_points, 2200) extracted_roads <- st_extract(rast, st_as_sfc(roads_sub) ) %>% st_as_sf() extracted_roads
ggplot(data=extracted_roads) + geom_sf(aes(fill = pr_HadGEM2.ES365_r1i1p1_rcp85), size=0) + coord_sf(crs = 4326) + scale_fill_continuous(low="thistle2", high="darkred", guide="colorbar",na.value="white")+ labs(title = "Roads of Yellowstone", subtitle = "Projected humidity in 2040", caption = "Data Source: Climate Futures...") + theme_tufte()
Now that we have all of the climate parameters for our study region, we can
compute functions of those variables.
For example, it is common to compute the midpoint of the maximum and minimum
daily temperature, which we can do using the mutate
function:
df <- data.frame(tasmax, tasmin) df <- df %>% mutate(tasmid = (tasmax + tasmin) / 2)
Now we have a new column called tasmid
that is the midpoint of the maximum
and minumum daily temperature!
Wind speed provides another example of a derived parameter that can be computed
for each day.
By default, we have two wind-related parameters: the eastward wind component
(called uas
) and the northward wind component (called vas
), both in units of
meters per second (you can get this information from cft::argument_reference
).
Wind speed can be computed from vas
and uas
using the Pythagorean theorem:
```{=latex} $\text{Wind speed} = \sqrt{v_{as}^2 + u_{as}^2}.$
In code: ```r df <- df %>% mutate(wind_speed = sqrt(vas^2 + uas^2))
Sometimes, there are new climate variables that summarize daily data. For example, you may want to compute:
All of these quantities summarize daily data, and require some aggregation time interval which in many cases will be one year. As an example, we will compute the growing season length for Wind Cave National Park across all models and emissions scenarios. To do this, we first need to define a new column for year, which we will use as a grouping variable:
df <- df %>% mutate(year = year(date))
Now, we want to compute growing season length for each year, model, emissions scenario combination.
growing_seasons <- df %>% group_by(rcp, model, year, ensemble) %>% summarize(season_length = sum(tasmid > 273.15)) %>% ungroup
Notice that we used our derived temperature midpoint column tasmid
, and computed the total (sum()
) number of days for each group where the temperature midpoint was greater than 0 C (or, 273.15 Kelvin, which are the units of the temperature data).
growing_seasons
Let's visualize the growing season over time for each model and emission scenario:
growing_seasons %>% ggplot(aes(year, season_length, color = rcp, group = model)) + geom_line(alpha = .3) + facet_wrap(~rcp, ncol = 1) + xlab("Year") + ylab("Growing season length (days)") + scale_color_manual(values = c("dodgerblue", "red")) + theme(legend.position = "none")
Use the tibble object that is returned from cft_df()
as an input to
compare_periods()
to compare climate between a reference and target period. You
may specify the function with which to aggregate your chosen variable as well
as the yearly time period months of the year to include in this calculation.
comps <- compare_periods(df, var1 = "pr", var2 = "tasmax", agg_fun = "mean", target_period = c(2025, 2030), reference_period = c(2020, 2024), months1 = 5:8, months2 = 5:8, scenarios = c("rcp45", "rcp85"))
This provides a data frame that can be used to compare the values in the target and reference period.
glimpse(comps)
One useful plot shows the difference in the two variables between reference and target periods:
title <- paste("Change from the historical vs. reference period:", comps$reference_period, comps$target_period, sep= " vs " )[1] comps %>% dplyr::select(parameter, rcp, model, reference_period, target_period, difference) %>% pivot_wider(names_from = parameter, values_from = difference) %>% ungroup %>% mutate(rcp = ifelse(rcp == "rcp45", "RCP 4.5", "RCP 8.5")) %>% ggplot(aes(pr, tasmax, color = rcp)) + ggtitle(title) + geom_point() + geom_hline(yintercept = 0, alpha = .2) + geom_vline(xintercept = 0, alpha = .2) + geom_text_repel(aes(label = model), segment.size = .3, size = 3) + xlab("Difference in mean daily precipitation (mm)") + ylab("Difference in mean daily max. temperature (C)") + scale_color_manual(values = c("dodgerblue", "red"), "Greenhouse gas\ntrajectory")
So, nearly all model runs indicate warming, but the amount of warming varies by model and emissions scenario. Precipitation increases and decreases are predicted by different models.
The amount of data generated by downscaled GCMs can be quite large (e.g., daily data at a few km spatial resolution). The Climate Futures Toolbox was developed to help users access and use smaller subsets.
Data is acquired from the Northwest Knowledge Server of the University of Idaho.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.