This vignette provides a walk-through of a common use case of the Available Data function and the cft package: understanding climate futures for a region of interest. We'll use Hot Springs National Park, located in Arkansas, USA and Yellowstone National Park, located in Wyoming, USA, as case studies.
Note that the available_data function is best used for downloading MACA climate model data for several climate variables from several climate models for any number of emission scenarios over a relatively small spatial region and over a relatively short time period. If you would like to download MACA climate model data for several climate variables from several climate models and any number of emission scenarios in their entirety for a single lat/long location, you should use the single_point_firehose function in the cft pacakge. A vignette on how to use the firehose function in the cft package is available at https://github.com/earthlab/cft/blob/main/vignettes/firehose.md.
This vignette will show you how to:
data.frame
containing climate dataTo get the most out of this vignette, we assume you have:
filter()
, group_by()
, and summarise()
)Global Circulation Models (GCMs) provide estimates of historical and future climate conditions. The complexity of the climate system has lead to a large number GCMs and it is common practice to examine outputs from many different models, treating each as one plausible future.
Most GCMs are spatially coarse (often 1 degree), but downscaling provides finer scale estimates. The cft package uses one downscaled climate model called MACA (Multivariate Adaptive Climate Analog) Version 2 (details here).
This package was originally written with the National Park Service in mind, so
it has the option to use the name of any park (or monument, preserve, etc.) within
the NPS. Use the cftdata()
function to specify a range of years, a set of models,
a set of parameters, and a set of representative concentration pathways to return.
Leaving these arguments empty will result in a download of all available data
for that location.
library(devtools) install_github("earthlab/cft")
library(cft) ls(pos="package:cft")
inputs <- cft::available_data()
?available_data ?single_point_firehose
Look at the variables, emission scenarios, and models for which data are available
levels(as.factor(inputs$variable_names$Variable)) levels(as.factor(inputs$variable_names$`Variable abbreviation`)) levels(as.factor(inputs$variable_names$Scenario)) levels(as.factor(inputs$variable_names$`Scenario abbreviation`)) levels(as.factor(inputs$variable_names$Model)) levels(as.factor(inputs$variable_names$`Model abbreviation`))
This code downloads data for all models, all emission scenarios, and 5 climate variables. Notice that this is a large download request.
input_variables <- inputs$variable_names %>% filter(Variable %in% c("Maximum Relative Humidity", "Maximum Temperature", "Minimum Relative Humidity", "Minimum Temperature", "Precipitation")) %>% filter(Scenario %in% c( "RCP 4.5", "RCP 8.5")) %>% filter(Model %in% c( "Beijing Climate Center - Climate System Model 1.1", "Beijing Normal University - Earth System Model", "Canadian Earth System Model 2", "Centre National de Recherches Météorologiques - Climate Model 5", "Commonwealth Scientific and Industrial Research Organisation - Mk3.6.0", "Community Climate System Model 4", "Geophysical Fluid Dynamics Laboratory - Earth System Model 2 Generalized Ocean Layer Dynamics", "Geophysical Fluid Dynamics Laboratory - Earth System Model 2 Modular Ocean", "Hadley Global Environment Model 2 - Climate Chemistry 365 (day) ", "Hadley Global Environment Model 2 - Earth System 365 (day)", "Institut Pierre Simon Laplace (IPSL) - Climate Model 5A - Low Resolution", "Institut Pierre Simon Laplace (IPSL) - Climate Model 5A - Medium Resolution", "Institut Pierre Simon Laplace (IPSL) - Climate Model 5B - Low Resolution", "Institute of Numerical Mathematics Climate Model 4", "Meteorological Research Institute - Coupled Global Climate Model 3", "Model for Interdisciplinary Research On Climate - Earth System Model", "Model for Interdisciplinary Research On Climate - Earth System Model - Chemistry", "Model for Interdisciplinary Research On Climate 5", "Norwegian Earth System Model 1 - Medium Resolution" )) %>% pull("Available variable") input_variables
You will need to specify an appropriate area of interest for downloading spatial data. We utilize the functionality of open street map to make it fast and easy to download shapefiles for any area of interest. Here I am using the open street map (OSM) protocol to retrieve a shapefile for Hot Springs National Park. I choice this park because it is the smallest national park.
This chunk of code is first using the getbb() function to retrieve a bounding box matching a plain language search request. That bounding box is then fed into the API call that retrieves the data and converts it into an r-specific spatial data format called sf.
OSM uses a 'key' and 'value' system for organizing requests. You may want to spend some time familiarizing yourself with the immense library of data you can access with this system. https://wiki.openstreetmap.org/wiki/Tags
bb <- getbb("Hot Springs") my_boundary <- opq(bb) %>% add_osm_feature(key = "boundary", value = "national_park") %>% osmdata_sf() my_boundary
Here you re-calibrate the bounding box relative to the actual shapefiles you downloaded. The bounding box above was pulled from the osm database, this bounding box is for the polygons you actually used. These two boxes may be the same or they may be different but the one derived from your downloaded shapefile is reproducible.
Notice that we specify osm_multipolygons instead of osm_polygons. This is a case-specific choice. When you download a shapefile from OSM, if will include a number of different spatial object types and you can choose several of those options to move forward. We chose multipolygons because they best matched our area of interest. Use the quick plot below to visualize your area of interest before continuing.
boundaries <- my_boundary$osm_multipolygons pulled_bb <- st_bbox(boundaries) pulled_bb
This quick map allows you to check that the polygon you downloaded matches your expectations.
basemap <- ggplot(data = boundaries) + geom_sf(fill = "cornflowerblue") + geom_sf_text(aes(label = boundaries$name)) basemap
WARNING: TIME CONSUMING STEPS AHEAD
It's time to download. Here is a simple and fast example of data from a single point (the centroid of our polygon) for one year (2099). On our local machine this takes ~3 minutes.
start_time <- Sys.time() center_point <- st_centroid(boundaries) %>% st_bbox(center_point) times <- inputs$available_times Pulled_data_single_space_single_timepoint <- inputs$src %>% hyper_filter(lat = lat <= c(center_point[4]+0.05) & lat >= c(center_point[2]-0.05)) %>% hyper_filter(lon = lon <= c(center_point[3]+0.05) & lon >= c(center_point[1]-0.05)) %>% hyper_filter(time = times$`Available times` == 44558) %>% hyper_tibble(select_var = input_variables[1:38]) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") end_time <- Sys.time() print(end_time - start_time) head(Pulled_data_single_space_single_timepoint)
If you try to download a large quantity of data, you will likely get the following error: Error note: an error like this is likely a connection error. Try again. Sometime you need to restart your R session to clear memory. "curl error details: Error in R_nc4_open: NetCDF: I/O failure Error in ncdf4::nc_open(x$source$source[1]) : Error in nc_open trying to open file https://cida.usgs.gov/thredds/dodsC/macav2metdata_daily_future"
This error occurs when you try to download too much data using the available_data() function. If you are downloading a large quantity of data, you should use the firehose() function in the cft package instead of the available_data() function. Running the code in the following code chunk will likely produce this error.
Here we remove the time filter to download all the requested input variables for the entire timeperiod.
NOTE: When you ask for a big download, that may exceed the API limit, the API asks you to confirm that you want to try. This will prompt a YES/NO question in your console that you need to answer if you want to continue. If you find this step taking a very long time, check that your console isn't waiting for you to confirm that you want to try this request.
start_time <- Sys.time() center_point <- st_centroid(boundaries) %>% st_bbox(center_point) Pulled_data_single_space_all_timepoints <- inputs$src %>% hyper_filter(lat = lat <= c(center_point[4]+0.05) & lat >= c(center_point[2]-0.05)) %>% hyper_filter(lon = lon <= c(center_point[3]+0.05) & lon >= c(center_point[1]-0.05)) %>% hyper_filter(time = times$`Available times` == 44558) %>% hyper_tibble(select_var = input_variables) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") end_time <- Sys.time() print(end_time - start_time) head(Pulled_data_single_space_all_timepoints)
This is where you will start seeing error returned from the API if you have requested too much data at one time.
Downloads from the API are limited to 500 MB per request. A request for a large area of interest combined with a long time series will return a cryptic runtime error informing you that your request was too large.
This error may look like this: "CURL Error: Transferred a partial file Error in Rsx_nc4_get_vara_int: NetCDF: DAP failure Var: pr_CCSM4_r6i1p1_rcp85 Ndims: 3 Start: 0,444,511 Count: 34333,2,3 Error in ncvar_get_inner(ncid2use, varid2use, nc$var[[li]]$missval, addOffset, : C function Rsx_nc4_get_var_int returned error"
The solution to this problem is to subset your requests to make them fit within the boundaries of the API. You can achieve this by balancing the size of your requested spatial extent and the length of your requested time period. For a small national park, it's possible to pull the entire time series from the API but larger parks will require you to request shorter time window or stitch multiple time windows together.
Time is one of the easiest dimensions to filter along because the time data are organized by sequential number, so they are easy to cut up and reassemble without needing to reference outside tables. The request filter is sensitive to vector length, so we build in our filter indexes into the time dataframe to insure that they are all the same length.
Here we are adding a generic index filled with 0's that we can use in the future, a column with 1's for the first half and 0's for the second half, and a column with 0's for the first half and 1's for the second.
library(tibble) # Year 2034 time_min <- 38716 time_max <- 73048 input_times <- inputs$available_times %>% add_column(index = 0) %>% add_column(first_half = 0) %>% add_column(second_half = 0) input_times[which(inputs$available_times[,1] >= time_min & inputs$available_times[,1] <= time_max ),3] <- 1 med <- median(row_number(input_times[,3])) input_times[which(as.numeric(row.names(input_times)) <= med),4] <- 1 input_times[which(as.numeric(row.names(input_times)) > med),5] <- 1 tail(input_times)
When your data request is too big for a single API request, it will need to be cut it into smaller requests. You can use the new index columns we created in the time table above to cut our single request into two requests. This leaves you with two data tables instead of one. You can bind these two tables back together, but it is time consuming to do so because the two tables have to take turns writing themselves to a new third table that they can occupy together. This takes a lot of short term memory and errors out some machines. It will save you time and effort if you can avoid binding them back together.
Pulled_data_sub1 <- Pulled_data <- inputs$src %>% hyper_filter(lat = lat <= c(center_point[4]+0.05) & lat >= c(center_point[2]-0.05)) %>% hyper_filter(lon = lon <= c(center_point[3]+0.05) & lon >= c(center_point[1]-0.05)) %>% hyper_filter(time = input_times[,4] == 1) %>% hyper_tibble(select_var = input_variables ) Pulled_data_sub2 <- Pulled_data <- inputs$src %>% hyper_filter(lat = lat <= c(center_point[4]+0.05) & lat >= c(center_point[2]-0.05)) %>% hyper_filter(lon = lon <= c(center_point[3]+0.05) & lon >= c(center_point[1]-0.05)) %>% hyper_filter(time = input_times[,5] == 1) %>% hyper_tibble(select_var = input_variables ) tail(Pulled_data_sub1) tail(Pulled_data_sub2)
In the previous examples, we pulled data from a single point in the center of a small national park. Now, we will show you how to download, analyze, and visualize these data over a large spatial extent.
First, we will redefine our area of interest to include all the national parks that touch Yellowstone National Park. OSM provides a bounding box just like we used above for Hot Springs, Arkansas. However, the bounding box provided by OSM for the national parks that touch Yellowstone National Park is larger than the official bounding box provided by the national parks service. In the following code, we start by obtaining the bounding box from OSM and then manually adjust the corners of that bounding box to match the official bounding box from the national parks service.
bb <- getbb("yellowstone") bb_manual <- bb bb_manual[1,1] <- -111.15594815937659 bb_manual[1,2] <- -109.8305463801207 bb_manual[2,1] <- 44.12354048271325 bb_manual[2,2] <- 45.11911641599412 my_boundary_yellow <- opq(bb_manual) %>% add_osm_feature(key = "boundary", value = "national_park") %>% osmdata_sf() my_boundary_yellow boundaries_large <- my_boundary_yellow$osm_multipolygons pulled_bb_large <- st_bbox(boundaries_large) pulled_bb_large
Let's do another quickplot to make sure our area of interest matches our expectations.
basemap <- ggplot(data = boundaries_large) + geom_sf(fill = "cornflowerblue") + geom_sf_text(aes(label = boundaries_large$name)) basemap
As your area of interest expands, you will need to cut you time and variable requests to accommodate more spatial points. Here we have cut our variables down to just "Precipitation", our scenarios down to just "RCP 8.5", and our models down to just "Model for Interdisciplinary Research On Climate 5"
input_variables <- inputs$variable_names %>% filter(Variable %in% c( "Precipitation")) %>% filter(Scenario %in% c( "RCP 8.5")) %>% filter(Model %in% c( "Model for Interdisciplinary Research On Climate 5" )) %>% pull("Available variable") input_variables
Pulled_data_large_area_few_variables <- inputs$src %>% hyper_filter(lat = lat <= c(pulled_bb_large[4]+0.05) & lat >= c(pulled_bb_large[2]-0.05)) %>% hyper_filter(lon = lon <= c(pulled_bb_large[3]+0.05) & lon >= c(pulled_bb_large[1]-0.05)) %>% hyper_filter(time = input_times$`Available times` == 73048) %>% hyper_tibble(select_var = input_variables ) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") head(Pulled_data_large_area_few_variables) tail(Pulled_data_large_area_few_variables)
Check this plot to make sure you downloaded your entire time series.
plot(Pulled_data_large_area_few_variables$time, Pulled_data_large_area_few_variables$`pr_MIROC5_r1i1p1_rcp85`)
Check here to make sure you downloaded the proper spatial extent.
check_filter <- Pulled_data_large_area_few_variables %>% filter(time == min(Pulled_data_large_area_few_variables$time)) ggplot() + geom_sf(data = boundaries_large, fill = "cornflowerblue") + geom_sf(data = check_filter, color = "red", size=0.5) + coord_sf(crs = 4326)
Pulled_data_sub1 <- inputs$src %>% hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% hyper_filter(time = input_times[,4] == 1) %>% hyper_tibble(select_var = input_variables) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") Pulled_data_sub2 <- inputs$src %>% hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% hyper_filter(time = input_times[,5] == 1) %>% hyper_tibble(select_var = input_variables) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") tail(Pulled_data_sub1) tail(Pulled_data_sub2)
Pulled_data_stitch <- bind_rows(Pulled_data_sub1, Pulled_data_sub2)
Check this plot to make sure you downloaded your entire time series.
plot(Pulled_data_stitch$time, Pulled_data_stitch$`pr_MIROC5_r1i1p1_rcp85`)
rast <- st_rasterize(Pulled_data_large_area_few_variables) plot(rast)
In order to aggregate our downloaded data into a polygon, we will use the terra and tidyterra packages to convert the raster of downloaded points into a Spatial Raster.
library(terra) library(tidyterra) data <- rast(rast)
We will now obtain the geometry of the boundary points for Yellowstone National Park and its adjacent national parks and convert them to a spatial object. We will then convert that spatial object into a Spatial Vector. Once we have a Spatial Vector of the boundary points for Yellowstone National Park and its adjacent national parks, we will extract data for those boundary points from the Spatial Raster and set xy=TRUE in order to obtain the lat/long coordinates of these points.
points <- vect(as_Spatial(boundaries_large$geometry)) extracted <- extract(data, points, xy=TRUE)
This code changes the name of the first column in the dataframe of extracted data at the boundary points for Yellowstone National Park and its adjacent national parks to 'nn'.
names(extracted)[1] <- "nn"
We will now convert the dataframe of extracted data at the boundary points for Yellowstone National Park and its adjacent national parks to a Spatial Vector using the lat/long coordinates of the points that we defined when we extracted the data from the Spatial Raster.
boundary_data <- vect(extracted, geom=c("x", "y"), crs="") boundary_data
We will now use the plot and points functions from the terra package to produce a spatial plot of the downloaded precipitation data for the bounding box of Yellowstone National Park and to plot the data that were extracted along the boundary points of Yellowstone National Park and its adjacent national parks.
plot(data$pr_MIROC5_r1i1p1_rcp85_lyr.1, main="Projected Humidity in 2040 in Yellowstone National Park") points(boundary_data, col = 'blue', alpha=0.1)
The code in this section will walk through how to create time and variable filters in order to download data using the available_data function.
The following code is used to create a filter which will select data from the year 2034. This filter creates an index which is given the value of 0 for times that are not in year 2034 and a value of 1 for times that are in the year 2034.
library(tibble) # Year 2034 time_min <- 72048 time_max <- 73048 input_times <- inputs$available_times %>% add_column(index = 0) input_times[which(inputs$available_times[,1] > time_min & inputs$available_times[,1] < time_max ),3] <- 1 tail(input_times)
The following code creates a filter to select precipitation data from the emission scenario RCP 8.5 from the Beijing Normal University Earth System Model and the Hadley Global Environmental Model 2. This is done by filtering by variable, by model, and by scenario using a separate call to the filter()
function for each filtering step.
input_variables <- inputs$variable_names %>% filter(Variable == "Precipitation") %>% filter(Model == c("Beijing Normal University - Earth System Model", "Hadley Global Environment Model 2 - Earth System 365 (day)")) %>% filter(Scenario == c( "RCP 8.5")) %>% pull("Available variable") input_variables
This code obtains the bounding box from Open Street Map (OSM) for the national parks adjacent to Yellowstone National Park and then manually updates the corners of the bounding box for this region to match the official bounding box from the National Parks Service. This is done because the bounding box obtained from OSM is larger than the official bounding box from the National Parks Service so this adjustment makes sure that the bounding box for our area of interest matches our desired area of interest.
bb <- getbb("yellowstone") bb_manual <- bb bb_manual[1,1] <- -111.15594815937659 bb_manual[1,2] <- -109.8305463801207 bb_manual[2,1] <- 44.12354048271325 bb_manual[2,2] <- 45.11911641599412 my_boundary <- opq(bb_manual) %>% add_osm_feature(key = "boundary", value = "national_park") %>% osmdata_sf() my_boundary
boundaries <- my_boundary$osm_multipolygons pulled_bb <- st_bbox(boundaries) pulled_bb
This code produces a map of the area of interest which can be used to ensure that the bounding box that was previously defined for the data matches our desired area of interest.
basemap <- ggplot(data = boundaries) + geom_sf(fill = "cornflowerblue") + geom_sf_text(aes(label = boundaries$name)) basemap
This code downloads precipitation data from the Beijing Normal University Earth System Model and the Hadley Global Environmental Model 2 for emission scenario RCP 8.5 for the year 2034 in the national parks that touch Yellowstone National Park. First, the hyper_filter()
function is used to select data from the national parks that touch Yellowstone National Park by filtering both the latitude and longitude coordinates to coordinates within a small delta of the minimum and maximum coordinates of the bounding box that was previously defined. Then the hyper_filter()
function is used to filter the data to data from the year 2034 using the time filter that was defined above. Finally, the hyper_tibble()
function is used to select the precipitation data from the Beijing Normal University Earth System Model and the Hadley Global Environmental Model 2 for emission scenario RCP 8.5 using the filter that was defined above.
Pulled_data <- inputs$src %>% hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% hyper_filter(time = input_times[,3] == 1) %>% hyper_tibble(select_var = input_variables ) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") head(Pulled_data)
This plot shows the data that were downloaded as red dots for the earliest time in year 2034 over the map of the national parks that touch Yellowstone National Park. The data are shown for the earliest time in year 2034 because data were downloaded for multiple times so filtering to a single point in time shows the spatial distribution of the downloaded data without having to include each of the times for which data were downloaded.
check_filter <- Pulled_data %>% filter(time == min(Pulled_data$time)) ggplot() + geom_sf(data = boundaries, fill = "cornflowerblue") + geom_sf(data = check_filter, color = "red", size=0.5) + coord_sf(crs = 4326)
First, we will download data for the RCP 8.5 emission scenario from the Model for Interdisciplinary Research on Climate - Earth System Model. This download will select maximum temperature, minimum temperature, eastward wind, and northward wind data within Yellowstone National Park at a single time point.
vars <- inputs$variable_names %>% filter(Variable %in% c("Maximum Temperature", "Minimum Temperature", "Eastward Wind", "Northward Wind")) %>% filter(Scenario %in% c("RCP 8.5")) %>% filter(Model %in% c("Model for Interdisciplinary Research On Climate - Earth System Model" )) %>% pull("Available variable") dat <- inputs$src %>% hyper_filter(lat = lat <= c(pulled_bb_large[4]+0.05) & lat >= c(pulled_bb_large[2]-0.05)) %>% hyper_filter(lon = lon <= c(pulled_bb_large[3]+0.05) & lon >= c(pulled_bb_large[1]-0.05)) %>% hyper_filter(time = input_times$`Available times` == 73048) %>% hyper_tibble(select_var = vars) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") head(dat)
The above code produces a list containing the data from the download. The following code converts the data from a list into a dataframe by first extracting each variable from the list and then producing a dataframe of those variables.
tasmax <- dat$`tasmax_MIROC-ESM_r1i1p1_rcp85` tasmin <- dat$`tasmin_MIROC-ESM_r1i1p1_rcp85` uas <- dat$`uas_MIROC-ESM_r1i1p1_rcp85` vas <- dat$`vas_MIROC-ESM_r1i1p1_rcp85` time <- dat$time df <- data.frame(time, tasmax, tasmin, uas, vas) head(df)
Now that we have all of the climate parameters for our study region, we can
compute functions of those variables.
For example, it is common to compute the midpoint of the maximum and minimum
daily temperature, which we can do using the mutate
function:
df <- df %>% mutate(tasmid = (tasmax + tasmin) / 2) head(df)
Now we have a new column called tasmid
that is the midpoint of the maximum
and minumum daily temperature!
Wind speed provides another example of a derived parameter that can be computed
for each day.
By default, we have two wind-related parameters: the eastward wind component
(called uas
) and the northward wind component (called vas
), both in units of
meters per second (you can get this information from cft::argument_reference
).
Wind speed can be computed from vas
and uas
using the Pythagorean theorem:
$\text{Wind speed} = \sqrt{v_{as}^2 + u_{as}^2}.$
In code:
df <- df %>% mutate(wind_speed = sqrt(vas^2 + uas^2)) head(df)
For this section, we will start by downloading data about the maximum temperature and precipitation from 2020 through the end of 2024 for both emission scenarios for the middle of Yellowstone National Park from the Model for Interdisciplinary Research On Climate - Earth System Model and the Norwegian Earth System Model 1. In order to do so, we will first create a filter for time, as shown in the code cells below.
time_min <- inputs$available_times[which(inputs$available_times[,2] == '2019-12-31'),1] time_max <- inputs$available_times[which(inputs$available_times[,2] == '2025-01-01'),1]
input_times <- inputs$available_times input_times$index <- rep(0, length(input_times$dates)) input_times[which(inputs$available_times[,1] > time_min & inputs$available_times[,1] < time_max ),3] <- 1 head(input_times)
We will now create a filter for the variables, emission scenarios, and models that we want to download.
vars <- inputs$variable_names %>% filter(Variable %in% c("Minimum Temperature", "Maximum Temperature")) %>% filter(Scenario %in% c("RCP 8.5", "RCP 4.5")) %>% filter(Model %in% c("Model for Interdisciplinary Research On Climate - Earth System Model", "Norwegian Earth System Model 1 - Medium Resolution")) %>% pull("Available variable") vars
This code uses both the time filter and the variable/model/emission scenario filter to download the desired data from the MACA climate model.
growing_data <- inputs$src %>% hyper_filter(lat = lat <= c(44.5+0.05) & lat >= c(44.5-0.05)) %>% hyper_filter(lon = lon <= c(-110.5+0.05) & lon >= c(-110.5-0.05)) %>% hyper_filter(time = input_times[,3] == 1) %>% hyper_tibble(select_var = vars ) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") head(growing_data)
We want to convert the downloaded data into a dataframe with columns for maximum temperature (tasmax), minimum temperature (tasmin), year, emission scenario (rcp), and model. In order to do so, we will start by determining how many values need to be in each of these vectors
length(growing_data$time)
Create a vector of year values for the data in the dataframe
year <- rep(c(rep(2020, 4*366), rep(2021, 4*365), rep(2022, 4*365), rep(2023, 4*365), rep(2024, 4*366)), 4) length(year)
This code produces a vector of emission scenarios for the data in the dataframe
rcp <- c(rep('rcp45', 7308), rep('rcp85', 7308), rep('rcp45', 7308), rep('rcp85', 7308)) length(rcp)
The following code produces a vector of model abbreviations for the data in the datafame
model <- c(rep('MIROC-ESM', 2*7308), rep('NorESM1-M', 2*7308)) length(model)
We will now create a vector of maximum temperature data by combining the maximum temperature data from each emission scenario and model together. We then repeat this process to produces a vector of minimum temperature data by combining the minimum temperature data from each emission scenario and model.
tasmax <- c(growing_data$`tasmax_MIROC-ESM_r1i1p1_rcp45`, growing_data$`tasmax_MIROC-ESM_r1i1p1_rcp85`, growing_data$`tasmax_NorESM1-M_r1i1p1_rcp45`, growing_data$`tasmax_NorESM1-M_r1i1p1_rcp85`) tasmin <- c(growing_data$`tasmin_MIROC-ESM_r1i1p1_rcp45`, growing_data$`tasmin_MIROC-ESM_r1i1p1_rcp85`, growing_data$`tasmin_NorESM1-M_r1i1p1_rcp45`, growing_data$`tasmin_NorESM1-M_r1i1p1_rcp85`)
Now that we have created all of the vectors of data that we need, we will create a dataframe out of these vectors using the following code.
df <- data.frame(tasmin, tasmax, year, rcp, model) head(df)
In order to determine the length of the growing season, we need to find the midpoint temperature, meaning the average of the minimum and maximum temperature. The following code computes the midpoint temperature for each value in the dataframe and adds a column to our existing dataframe containing these values.
df <- df %>% mutate(tasmid = (tasmax + tasmin) / 2) head(df)
Sometimes, there are new climate variables that summarize daily data. For example, you may want to compute:
All of these quantities summarize daily data, and require some aggregation time interval which in many cases will be one year. As an example, we will compute the growing season length for Wind Cave National Park across all models and emissions scenarios. To do this, we first need to define a new column for year, which we will use as a grouping variable:
#df <- df %>% #mutate(year = year(date))
Now, we want to compute growing season length for each year, model, emissions scenario combination.
growing_seasons <- df %>% group_by(year) %>% mutate(season_length = sum(tasmid > 273.15))
Notice that we used our derived temperature midpoint column tasmid
, and computed the total (sum()
) number of days for each group where the temperature midpoint was greater than 0 C (or, 273.15 Kelvin, which are the units of the temperature data).
growing_seasons
Let's visualize the growing season over time for each model and emission scenario:
growing_seasons %>% ggplot(aes(x = year, y = season_length, color = rcp)) + geom_line(alpha = .3) + xlab("Year") + ylab("Growing season length (days)") + scale_color_manual(values = c("dodgerblue", "red")) + theme(legend.position = "none")
We now want to compare the climate in the middle of Yellowstone National Park between two time periods, 2020-2025 and 2025-2030. We will begin by creating a time filter using the following two code chunks. This time filter will select data between 2020 and 2030.
time_min <- inputs$available_times[which(inputs$available_times[,2] == '2019-12-31'),1] time_max <- inputs$available_times[which(inputs$available_times[,2] == '2031-01-01'),1]
input_times <- inputs$available_times input_times$index <- rep(0, length(input_times$dates)) input_times[which(inputs$available_times[,1] > time_min & inputs$available_times[,1] < time_max ),3] <- 1 tail(input_times)
We will now create a filter for the variables, emission scenarios, and models for which we want data from the MACA climate model. Here we are selecting precipitation and maximum temperature data for emission scenarios RCP 4.5 and RCP 8.5 from the Model for Interdisciplinary Research on Climate - Earth System Model and the medium resolution Norwegian Earth System Model 1.
vars <- inputs$variable_names %>% filter(Variable %in% c("Precipitation", "Maximum Temperature")) %>% filter(Scenario %in% c("RCP 8.5", "RCP 4.5")) %>% filter(Model %in% c("Model for Interdisciplinary Research On Climate - Earth System Model", "Norwegian Earth System Model 1 - Medium Resolution")) %>% pull("Available variable") vars
This code uses the variable, emission scenario, and climate model filter and the time filter that we defined previously to select and download data from the middle of Yellowstone National Park from the MACA climate model data.
climate_data <- inputs$src %>% hyper_filter(lat = lat <= c(44.5+0.05) & lat >= c(44.5-0.05)) %>% hyper_filter(lon = lon <= c(-110.5+0.05) & lon >= c(-110.5-0.05)) %>% hyper_filter(time = input_times[,3] == 1) %>% hyper_tibble(select_var = vars ) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant") head(climate_data)
We now want to convert the data into a dataframe that has columns for the year, emission scenario (rcp), and climate model (model).
length(climate_data$time)
The following code creates a vector of year values for the downloaded data.
year <- rep(c(rep(2020, 4*366), rep(2021, 4*365), rep(2022, 4*365), rep(2023, 4*365), rep(2024, 4*366), rep(2025, 4*365), rep(2026, 4*365), rep(2027, 4*365), rep(2028, 4*366), rep(2029, 4*365), rep(2030, 4*365)), 4) length(year)
This code produces a vector of rcp emission scenarios for the downloaded data.
rcp <- c(rep('rcp45', 16072), rep('rcp85', 16072), rep('rcp45', 16072), rep('rcp85', 16072)) length(rcp)
The following code generates a vector of climate model abbreviations for the downloaded data.
model <- c(rep('MIROC-ESM', 2*16072), rep('NorESM1-M', 2*16072)) length(model)
This code produces a vector of precipitation data, grouping first by model and then by emission scenario.
pr <- c(climate_data$`pr_MIROC-ESM_r1i1p1_rcp45`, climate_data$`pr_MIROC-ESM_r1i1p1_rcp85`, climate_data$`pr_NorESM1-M_r1i1p1_rcp45`, climate_data$`pr_NorESM1-M_r1i1p1_rcp85`) length(pr)
The following code produces a vector of maximum temperature data by first grouping by model and then grouping by emission scenario.
tasmax <- c(climate_data$`tasmax_MIROC-ESM_r1i1p1_rcp45`, climate_data$`tasmax_MIROC-ESM_r1i1p1_rcp85`, climate_data$`tasmax_NorESM1-M_r1i1p1_rcp45`, climate_data$`tasmax_NorESM1-M_r1i1p1_rcp85`) length(tasmax)
Use the tibble object that is returned from cft_df()
as an input to
compare_periods()
to compare climate between a reference and target period. You
may specify the function with which to aggregate your chosen variable as well
as the yearly time period months of the year to include in this calculation.
#comps <- compare_periods(df, #var1 = "pr", #var2 = "tasmax", #agg_fun = "mean", #target_period = c(2025, 2030), #reference_period = c(2020, 2024), #scenarios = c("rcp45", "rcp85"))
This provides a data frame that can be used to compare the values in the target and reference period.
#glimpse(comps)
One useful plot shows the difference in the two variables between reference and target periods:
#title <- paste("Change from the historical vs. reference period:", #comps$reference_period, comps$target_period, sep= " vs " )[1] #comps %>% #dplyr::select(parameter, rcp, model, reference_period, target_period, difference) %>% #pivot_wider(names_from = parameter, values_from = difference) %>% #ungroup %>% #mutate(rcp = ifelse(rcp == "rcp45", "RCP 4.5", "RCP 8.5")) %>% #ggplot(aes(pr, tasmax, color = rcp)) + #ggtitle(title) + #geom_point() + #geom_hline(yintercept = 0, alpha = .2) + #geom_vline(xintercept = 0, alpha = .2) + #geom_text_repel(aes(label = model), segment.size = .3, size = 3) + #xlab("Difference in mean daily precipitation (mm)") + #ylab("Difference in mean daily max. temperature (C)") + #scale_color_manual(values = c("dodgerblue", "red"), #"Greenhouse gas\ntrajectory")
So, nearly all model runs indicate warming, but the amount of warming varies by model and emissions scenario. Precipitation increases and decreases are predicted by different models.
This code obtains Open Street Map data for rivers in Yellowstone National Park and its adjacent parks and converts it to a spatial format.
river <- opq(bb_manual) %>% add_osm_feature(key = "waterway", value = "river") %>% osmdata_sf() river
The following code obtains a subset of the Open Street Map river geometry for Yellowstone National Park and its adjacent national parks and then converts it into a Spatial Vector format. In the final line of this code chunk, we use the extract function from the terra package to extract data from our Spatial Raster along the river geometry and store the lat/long coordinates of these points in an xy format in the resulting dataframe.
river_sub <- st_buffer(river$osm_lines, 2200) river_points <- vect(as_Spatial(river_sub)) extracted_river <- extract(data, river_points, xy=TRUE)
head(extracted_river) colnames(extracted_river)[1] <- "pre"
We will now convert the extracted river geometry from a spatial object into a Spatial Vector where x and y define the spatial coordinates of the data points.
river_data <- vect(extracted_river, geom=c("x", "y"), crs="") river_data
We will now use the plot and points functions from the terra package to plot the Spatial Raster data and overlay the humidity data along the rivers in Yellowstone National Park and its adjacent parks. This produces a plot of the projected humidity in 2040 along the rivers in Yellowstone National Park and its adjacent national parks.
plot(data$pr_MIROC5_r1i1p1_rcp85_lyr.1, main="Rivers of Yellowstone \n Projected humidity in 2040") points(river_data, col = river_data$pre)
If we want to remove the background Spatial Raster data from the above plot, we will need to define and empty Spatial Raster object. The first step in defining an empty Spatial Raster object is to obtain the extent of our Spatial Raster data, which is done using the following code.
data_extent <- ext(data) data_extent
Using the extent of the Spatial Raster data, we will obtain the minimum and maximum x and y values using the code below.
xmin <- data_extent$xmin[[1]] xmax <- data_extent$xmax[[1]] ymin <- data_extent$ymin[[1]] ymax <- data_extent$ymax[[1]]
We will now select the x and y resolution of the Spatial Raster data.
resy <- yres(data) resx <- xres(data)
Using the minimum and maximum x and y values and the x and y resolution of the points in our Spatial Raster of data, we will create an empty Spatial Raster with the same extent, resolution, and projection as our Spatial Raster of data.
template <- rast(crs = "WGS84", xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, resolution = resx) template
We will now use our empty Spatial Raster to create a plot of the projected humidity in 2040 along the rivers in Yellowstone National Park and its adjacent national parks using the plot and points functions from the terra package.
plot(template, main="Rivers of Yellowstone \n Projected humidity in 2040") points(river_data, col = river_data$pre)
This code generates a plot of the projected humidity in 2040 along the rivers in Yellowstone National Park and its adjacent parks.
This code is aggregating to road segments in Yellowstone National Park and its adjacent parks in a similar manner to the aggregation to rivers that was performed above. First, we obtain the Open Street Map data of road segments in Yellowstone National Park and its adjacent parks and convert the open street map data into a spatial format. We then obtain a subset of the Open Street Map road segment data and convert it to a Spatial Vector. Then the humidity data is extracted from the Spatial Raster of data along the road segment geometry in Yellowstone National Park and its adjacent parks with the lat/long coordinates of the extracted road data being stored in xy format in the resulting dataframe.
roads <- opq(pulled_bb_large) %>% add_osm_feature(key = 'highway', value = 'primary') %>% add_osm_feature(key = 'highway', value = 'secondary') %>% osmdata_sf() roads_sub <- st_buffer(roads$osm_lines, 2200) road_points <- vect(as_Spatial(roads_sub)) extracted_roads <- extract(data, road_points, xy=TRUE) colnames(extracted_roads)[1] <- "pre"
The extracted Spatial Raster data along the road segment geometry for Yellowstone National Park and its adjacent national parks is converted to a Spatial Vector using the x and y coordinates in the dataframe of extracted data.
road_data <- vect(extracted_roads, geom=c("x", "y"), crs="") road_data
We will now use the plot and points functions from the terra package to plot the Spatial Raster data and overlay the humidity data along the road segments in Yellowstone National Park and its adjacent parks. This produces a plot of the projected humidity in 2040 along the road segments in Yellowstone National Park and its adjacent national parks.
plot(data$pr_MIROC5_r1i1p1_rcp85_lyr.1, main="Roads of Yellowstone \n Projected humidity in 2040") points(road_data, col = road_data$pre)
We will now use the empty Spatial Raster defined previously to plot the projected humidity in 2040 along the road segments in Yellowstone National Park and its adjacent national parks.
plot(template, main="Roads of Yellowstone \n Projected humidity in 2040") points(road_data, col = road_data$pre)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.