This vignette provides a walk-through of a common use case of the cft package: understanding climate futures for a region of interest. We'll use Wind Cave National Park, located in South Dakota, USA as a case study.
This vignette will show you how to:
data.frame
containing climate dataTo get the most out of this vignette, we assume you have:
filter()
, group_by()
, and summarize()
)Global Circulation Models (GCMs) provide estimates of historical and future climate conditions. The complexity of the climate system has lead to a large number GCMs and it is common practice to examine outputs from many different models, treating each as one plausible future.
Most GCMs are spatially coarse (often 1 degree), but downscaling provides finer scale estimates. The cft package uses one downscaled climate model called MACA (Multivariate Adaptive Climate Analog) Version 2 (details here).
This package was originally written with the National Park Service in mind, so
it has the option to use the name of any park (or monument, preserve, etc.) within
the NPS. Use the cftdata()
function to specify a range of years, a set of models,
a set of parameters, and a set of representative concentration pathways to return.
Leaving these arguments empty will results in a download of all available data
for that location.
library(cft) library(rgdal) library(tibble) library(ggplot2) library(tidyr) library(lubridate) library(ggrepel) library(dplyr)
# Choose a project directory to store data # (Leaving this argument empty will download data to a temporary directory) proj_dir <- "~" # Create a Spatial* object for your area of interest # (e.g., you could read a local shapefile here with readOGR) aoi <- rgdal::readOGR(system.file("extdata", "windcave.geojson", package = "cft")) #> OGR data source with driver: GeoJSON #> Source: "/home/max/R/x86_64-pc-linux-gnu-library/4.0/cft/extdata/windcave.geojson", layer: "windcave" #> with 1 features #> It has 19 fields # download data file_refs <- cftdata(aoi = aoi, area_name = "windcave", years = c(2020, 2040), local_dir = proj_dir, parameters = c("tasmin", "tasmax", "uas", "vas", "pr"), ncores = parallel::detectCores() / 2) #> [1] "Building area of interest grid..." #> [1] "Retrieving climate data for windcave" #> [1] "Saving local files to /home/max/windcave"
The output of cftdata
is a data.frame, where each row corresponds to one file:
glimpse(file_refs) #> Rows: 200 #> Columns: 13 #> $ local_file <chr> "tasmin_windcave_bcc-csm1-1_r1i1p1_rcp45_macav2metda… #> $ local_path <chr> "/home/max/windcave/tasmin_windcave_bcc-csm1-1_r1i1p… #> $ model <chr> "bcc-csm1-1", "bcc-csm1-1", "bcc-csm1-1", "bcc-csm1-… #> $ parameter <chr> "tasmin", "tasmin", "tasmax", "tasmax", "uas", "uas"… #> $ rcp <chr> "rcp45", "rcp85", "rcp45", "rcp85", "rcp45", "rcp85"… #> $ ensemble <chr> "r1i1p1", "r1i1p1", "r1i1p1", "r1i1p1", "r1i1p1", "r… #> $ year1 <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020… #> $ year2 <dbl> 2040, 2040, 2040, 2040, 2040, 2040, 2040, 2040, 2040… #> $ area_name <chr> "windcave", "windcave", "windcave", "windcave", "win… #> $ units <chr> "K", "K", "K", "K", "m s-1", "m s-1", "m s-1", "m s-… #> $ full_varname <chr> "Minimum Temperature", "Minimum Temperature", "Maxim… #> $ internal_varname <chr> "air_temperature", "air_temperature", "air_temperatu… #> $ parameter_long <chr> "air_temperature", "air_temperature", "air_temperatu…
We can generate a data.frame of daily climate values averaged over the study
region using the cft_df()
function:
df <- cft_df(file_refs, ncores = parallel::detectCores() / 2) #> Computing spatial averages... #> Generating climate data.frame...
This df
is a tibble where each row represents a day by model by scenario
combination, where the climate parameters of interest are represented as
columns. Note that, in the output of cft_df()
, average values represent all 2km by
2km grid cells that touch the Park boundary:
glimpse(df) #> Rows: 306,840 #> Columns: 10 #> $ rcp <chr> "rcp45", "rcp85", "rcp45", "rcp85", "rcp45", "rcp85", "rcp4… #> $ date <date> 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01… #> $ model <chr> "bcc-csm1-1", "bcc-csm1-1", "bcc-csm1-1-m", "bcc-csm1-1-m",… #> $ ensemble <chr> "r1i1p1", "r1i1p1", "r1i1p1", "r1i1p1", "r1i1p1", "r1i1p1",… #> $ area_name <chr> "windcave", "windcave", "windcave", "windcave", "windcave",… #> $ tasmin <dbl> 258.6535, 267.4921, 262.1381, 255.9822, 265.8079, 257.2419,… #> $ tasmax <dbl> 271.0576, 279.3801, 278.6900, 276.4950, 278.7834, 276.8796,… #> $ uas <dbl> 2.5145298, 4.0373213, -1.4234270, 2.9496087, 1.8698513, 3.2… #> $ vas <dbl> 0.41528416, 0.77429012, -1.48429512, 1.16956005, -4.5207128… #> $ pr <dbl> 0.00000000, 0.00000000, 1.73096346, 0.00000000, 0.02314971,…
We can work with this data frame to do some basic visualizations, like time
series for a particular parameter.
For example, we can use the filter()
function from dplyr to focus on specific
date ranges and specific emissions scenarios.
df %>% filter(date > Sys.Date(), date < Sys.Date() + 365 * 5, # next 5 years rcp == "rcp85") %>% ggplot(aes(date, tasmax, group = model)) + geom_line(alpha = .2) + xlab("Time") + ylab("Max. air temperature (K)")
Now that we have all of the climate parameters for our study region, we can
compute functions of those variables.
For example, it is common to compute the midpoint of the maximum and minimum
daily temperature, which we can do using the mutate
function:
df <- df %>% mutate(tasmid = (tasmax + tasmin) / 2)
Now we have a new column called tasmid
that is the midpoint of the maximum
and minumum daily temperature!
Wind speed provides another example of a derived parameter that can be computed
for each day.
By default, we have two wind-related parameters: the eastward wind component
(called uas
) and the northward wind component (called vas
), both in units of
meters per second (you can get this information from cft::argument_reference
).
Wind speed can be computed from vas
and uas
using the Pythagorean theorem:
$\text{Wind speed} = \sqrt{v_{as}^2 + u_{as}^2}.$
In code:
df <- df %>% mutate(wind_speed = sqrt(vas^2 + uas^2))
Sometimes, there are new climate variables that summarize daily data. For example, you may want to compute:
All of these quantities summarize daily data, and require some aggregation time interval which in many cases will be one year. As an example, we will compute the growing season length for Wind Cave National Park across all models and emissions scenarios. To do this, we first need to define a new column for year, which we will use as a grouping variable:
df <- df %>% mutate(year = year(date))
Now, we want to compute growing season length for each year, model, emissions scenario combination.
growing_seasons <- df %>% group_by(rcp, model, year, ensemble) %>% summarize(season_length = sum(tasmid > 273.15)) %>% ungroup #> `summarise()` regrouping output by 'rcp', 'model', 'year' (override with `.groups` argument)
Notice that we used our derived temperature midpoint column tasmid
, and computed the total (sum()
) number of days for each group where the temperature midpoint was greater than 0 C (or, 273.15 Kelvin, which are the units of the temperature data).
growing_seasons #> # A tibble: 840 x 5 #> rcp model year ensemble season_length #> <chr> <chr> <dbl> <chr> <int> #> 1 rcp45 bcc-csm1-1 2020 r1i1p1 318 #> 2 rcp45 bcc-csm1-1 2021 r1i1p1 292 #> 3 rcp45 bcc-csm1-1 2022 r1i1p1 311 #> 4 rcp45 bcc-csm1-1 2023 r1i1p1 319 #> 5 rcp45 bcc-csm1-1 2024 r1i1p1 312 #> 6 rcp45 bcc-csm1-1 2025 r1i1p1 289 #> 7 rcp45 bcc-csm1-1 2026 r1i1p1 313 #> 8 rcp45 bcc-csm1-1 2027 r1i1p1 291 #> 9 rcp45 bcc-csm1-1 2028 r1i1p1 306 #> 10 rcp45 bcc-csm1-1 2029 r1i1p1 292 #> # … with 830 more rows
Let's visualize the growing season over time for each model and emission scenario:
growing_seasons %>% ggplot(aes(year, season_length, color = rcp, group = model)) + geom_line(alpha = .3) + facet_wrap(~rcp, ncol = 1) + xlab("Year") + ylab("Growing season length (days)") + scale_color_manual(values = c("dodgerblue", "red")) + theme(legend.position = "none")
Use the tibble object that is returned from cft_df()
as an input to
compare_periods()
to compare climate between a reference and target period. You
may specify the function with which to aggregate your chosen variable as well
as the yearly time period months of the year to include in this calculation.
comps <- compare_periods(df, var1 = "pr", var2 = "tasmax", agg_fun = "mean", target_period = c(2025, 2030), reference_period = c(2020, 2024), months1 = 5:8, months2 = 5:8, scenarios = c("rcp45", "rcp85"))
This provides a data frame that can be used to compare the values in the target and reference period.
glimpse(comps) #> Rows: 80 #> Columns: 9 #> Groups: rcp [2] #> $ model <chr> "bcc-csm1-1", "bcc-csm1-1-m", "BNU-ESM", "CanESM2", … #> $ rcp <chr> "rcp45", "rcp45", "rcp45", "rcp45", "rcp45", "rcp45"… #> $ parameter <chr> "pr", "pr", "pr", "pr", "pr", "pr", "pr", "pr", "pr"… #> $ units <chr> "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm"… #> $ reference_period <chr> "2020 - 2024", "2020 - 2024", "2020 - 2024", "2020 -… #> $ target_period <chr> "2025 - 2030", "2025 - 2030", "2025 - 2030", "2025 -… #> $ reference_value <dbl> 2.400247, 1.900319, 2.677509, 2.223219, 2.477713, 2.… #> $ target_value <dbl> 2.365003, 2.386500, 2.439448, 2.354286, 2.618627, 1.… #> $ difference <dbl> -0.03524395, 0.48618144, -0.23806104, 0.13106647, 0.…
One useful plot shows the difference in the two variables between reference and target periods:
title <- paste("Change from the historical vs. reference period:", comps$reference_period, comps$target_period, sep= " vs " )[1] comps %>% dplyr::select(parameter, rcp, model, reference_period, target_period, difference) %>% pivot_wider(names_from = parameter, values_from = difference) %>% ungroup %>% mutate(rcp = ifelse(rcp == "rcp45", "RCP 4.5", "RCP 8.5")) %>% ggplot(aes(pr, tasmax, color = rcp)) + ggtitle(title) + geom_point() + geom_hline(yintercept = 0, alpha = .2) + geom_vline(xintercept = 0, alpha = .2) + geom_text_repel(aes(label = model), segment.size = .3, size = 3) + xlab("Difference in mean daily precipitation (mm)") + ylab("Difference in mean daily max. temperature (C)") + scale_color_manual(values = c("dodgerblue", "red"), "Greenhouse gas\ntrajectory")
So, nearly all model runs indicate warming, but the amount of warming varies by model and emissions scenario. Precipitation increases and decreases are predicted by different models.
The cft package includes a reference object that can tell you what models,
scenarios, and climate variables are available.
You can use the reference object within to see the
available argument options for the cftdata()
function: models
, parameters
,
and scenarios
(as representative concentration pathways).
The labels
field will spell out the specific meaning of each parameter, the
variables
field shows the internal variable name of parameters in the data
sets themselves, and the units
field shows the measurement unit of each
variable.
references <- cft::argument_reference
Here are the GCMs (the models):
references$models #> [1] "bcc-csm1-1" "bcc-csm1-1-m" "BNU-ESM" "CanESM2" #> [5] "CCSM4" "CNRM-CM5" "CSIRO-Mk3-6-0" "GFDL-ESM2M" #> [9] "GFDL-ESM2G" "HadGEM2-ES365" "HadGEM2-CC365" "inmcm4" #> [13] "IPSL-CM5A-LR" "IPSL-CM5A-MR" "IPSL-CM5B-LR" "MIROC5" #> [17] "MIROC-ESM" "MIROC-ESM-CHEM" "MRI-CGCM3" "NorESM1-M"
Here are the emissions scenarios:
references$scenarios #> [1] "rcp45" "rcp85"
And here are the climate parameters:
references$parameters #> [1] "tasmin" "tasmax" "rhsmin" "rhsmax" "pr" "rsds" "uas" "vas" #> [9] "huss" "vpd"
Labels for each acronym are also available, for example:
references$labels["vpd"] #> $vpd #> [1] "Vapor Pressure Deficit" references$labels["CCSM4"] #> $CCSM4 #> [1] "Community Climate System Model 4" # For hyphenated acronyms, use them either as objects or with backticks model <- "IPSL-CM5A-MR" references$labels[model] #> $`IPSL-CM5A-MR` #> [1] "Institut Pierre Simon Laplace (IPSL) - Climate Model 5A - Medium Resolution" references$labels$`IPSL-CM5A-MR` #> [1] "Institut Pierre Simon Laplace (IPSL) - Climate Model 5A - Medium Resolution"
Not every model has the same set of parameters available, and the
get_args
method lists model-specific information.
references$get_args("CCSM4") #> $parameters #> [1] "tasmin" "tasmax" "pr" "rsds" "uas" "vas" "huss" #> #> $scenarios #> [1] "rcp45" "rcp85" #> #> $ensemble #> [1] "r6i1p1"
The amount of data generated by downscaled GCMs can be quite large (e.g., daily data at a few km spatial resolution). The Climate Futures Toolbox was developed to help users access and use smaller subsets.
Data is acquired from the Northwest Knowledge Server of the University of Idaho.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.