knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-", cache = TRUE )
THREDDS data servers are webservers that provide access to gridded data—in the form of NetCDF files—via a handy RESTful API (among other protocols). THREDDS servers not only serve data, but provide tools for subsetting gridded data in space and time on the server, reducing the amount of data the end user must download. thredds is an R front end to THREDDS servers designed to make exploration, subsetting, and download of THREDDS data into R as simple as possible.
You can install threddr from github with:
# install.packages("devtools") devtools::install_github("bocinsky/thredds")
thredds enables exploration, subsetting, and downloading from THREDDS servers through a series of R commands. To enable easy tab-completion, all thredds functions have a "tds_
" prefix. Here, we first explore the structure of the CIDA THREDDS server hosted by USGS. We do so using the tds_list_datasets
function.
library(thredds) library(tidyverse) library(magrittr) # The base url of the CIDA THREDDS server cida <- "https://cida.usgs.gov/thredds/" datasets <- tds_list_datasets(thredds_url = cida) datasets
This returns a data frame with dataset names and their paths. It also will return nested catalogs, should there be any. We can then use the tds_list_services
function to list all of the THREDDS services available for a particular datasets. Lets list the services for "Future LOCA" dataset:
loca_url <- datasets[datasets$dataset == "Future LOCA",]$path loca_url loca_services <- tds_list_services(loca_url) loca_services
Similarly to tds_list_datasets
, tds_list_services
returns a data frame with service names and URLs.
The first service functions to be developed in thredds are for the NetCDF Subset service ("ncss"). Here, we first query the variables available in the dataset, which in the case of the LOCA dataset represent combinations of CMIP5 models, representative climate pathways (RCPs, or emissions scenarios), and climate variables. The other fields in the table will vary based on the dataset.
loca_ncss <- loca_services[loca_services$service == "NetcdfSubset",]$path loca_ncss loca_vars <- tds_ncss_list_vars(ncss_url = loca_ncss) loca_vars
Finally, we can download some data. Lets download the maximum daily temperature data from the Community Climate System Model (CCSM4), for both RCPs so we can compare them. We use the tds_ncss_download
function; we have to provide the URL of the NetCDF Subset service (the same we used for tds_ncss_list_vars
above), a character vector of variable names, and a file name and location for the output. We'll create a temporary location to put the file for now.
CCSM4 <- tds_ncss_download(ncss_url = loca_ncss, out_file = paste0(tempdir(),"/CCSM4.nc"), bbox = sf::st_bbox(c(xmin = -116.5, xmax = -115, ymin = 44.5, ymax = 45)), vars = c("tasmax_CCSM4_r6i1p1_rcp45","tasmax_CCSM4_r6i1p1_rcp85"), ncss_args = list(temporal = "all")) CCSM4
The tds_ncss_download
function outputs the path to the downloaded data. In R, we can load a NetCDF using the brick()
function from the raster package. Here, we do so, and then aggregate and plot the data in a couple of ways. Note that we have to import each variable separately when using raster; the forthcoming package stars will be able to load many variables simultaneously.
CCSM4_rcp45 <- raster::brick(CCSM4, varname = "tasmax_CCSM4_r6i1p1_rcp45") CCSM4_rcp85 <- raster::brick(CCSM4, varname = "tasmax_CCSM4_r6i1p1_rcp85") # Calculate the average precipitation across space, and plot the time series CCSM4_rcp45 %<>% raster::cellStats(mean) CCSM4_rcp85 %<>% raster::cellStats(mean) # Create a data frame CCSM4_timeseries <- tibble::tibble(Date = names(CCSM4_rcp45) %>% stringr::str_remove("X") %>% lubridate::as_date(), `RCP 4.5` = CCSM4_rcp45, `RCP 8.5` = CCSM4_rcp85) # Let's extract the average temperature for June for each year, and plot CCSM4_timeseries %>% filter(lubridate::month(Date) == 6) %>% mutate(Year = lubridate::year(Date) %>% as.integer()) %>% group_by(Year) %>% summarise_at(.vars = dplyr::vars(`RCP 4.5`, `RCP 8.5`), .funs = mean) %>% gather(`Emissions\nScenario`, Value, -Year) %>% ggplot(aes(x = Year, y = Value, color = `Emissions\nScenario`)) + geom_line() + geom_smooth() + ylab("Average June Daytime\nMaximum Temperature (C)")
I welcome contributors to the development of the thredds package. Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.