knitr::opts_chunk$set(echo = TRUE,
                      collapse = TRUE,
                      comment = "#>")
# This option allows the displayed title ("Downloading ERA5 data for use...") to
# be different from the \VignetteIndexEntry{} title:
options(rmarkdown.html_vignette.check_title = FALSE)

Setup

ERA5 climate data can be downloaded from the ECMWF climate data store (CDS). Note that in July 2024 the CDS migrated to a new platform, and the old platform was deprecated in Sept 2024. The following describes how to access the data using R:

1) Register for an ECMWF account here. Upon registering you will need to accept all of the Terms and Conditions listed at the bottom of the form.

2) Then, navigate to the CDS site here and login using the button in the top right. Once logged in, hover your mouse over your name in the top right, and click on the option "Your profile" that appears (this should bring you to this page. Here you will find your User ID (UID) and Personal Access Token, both which are required for you to remotely download data from the CDS. Make a note of these.

3) Each CDS dataset has its own unique Terms of Use. You will need to accept these Terms for ERA5-reanalysis at this page (scroll down to "Terms of use" and accept). This same set of terms also applies for other Copernicus products, including ERA5-land.

The following packages are required:

library(mcera5)
library(dplyr)
library(ecmwfr)
library(ncdf4)
library(curl)
library(keyring)
library(abind)
library(lubridate)
library(tidync)
library(NicheMapR) # remotes::install_github("mrke/NicheMapR")
library(microctools) # remotes::install_github("ilyamaclean/microctools")
#############
##FUNCTIONS##
#############

files_source <- list.files(here::here("R/"), full.names = T)
sapply(files_source,source)

Set user credentials for API access

# assign your credentials from the CDS User ID and Personal Access Token
uid <- "*****"
cds_access_token <- "********-****-****-****-************"

ecmwfr::wf_set_key(user = uid,
                   key = cds_access_token,
                   service = "cds")

Usage

Building a request

The first step is to decide on the spatial and temporal extents of the data retrieved from the CDS. There is can be a moderately-sized overhead when submitting a request (and consequently waiting in a queue); at times, when the CDS server is busy, this can entail several hours of waiting for a download to execute, even for small amounts of data (<10 MB). Such overhead time is independent of mcera5, yet can be reduced base upon the spatial/temporal dimensions of the query. It is therefore sensible to be strategic with the number and size of requests submitted. Generally, querying temporal durations greater than one year causes a delay, while query of wide spatial extents can occur rapidly. It is therefore most efficient to download time series data in temporal chunks (e.g. monthly basis) each specifying a region encompassing multiple points of interest. We recommend users track current usage of the CDS and follow ECMWF news (such as the migration of the MARS Data Centre between November 2021 and April 2022).

Requests are submitted as whole months, and they will also be split by year, so that they are not too big to handle and to expedite wait times. The splitting of requests which overlap year boundaries is dealt with automatically. Users can request files to be merged together in the request_era5() function (covered later on).

Once these parameters are decided, one can begin to build the request(s) as follows:

# bounding coordinates (in WGS84 / EPSG:4326)
xmn <- -4
xmx <- -2
ymn <- 49
ymx <- 51

# temporal extent
st_time <- as.POSIXlt("2010-02-26 00:00", tz = "UTC")
en_time <- as.POSIXlt("2010-03-01 23:00", tz = "UTC")


# Set a unique prefix for the filename (here based on spatial
# coordinates), and the file path for downloaded .nc files
file_prefix <- "era5_-4_-2_49_51"
file_path <- getwd()


# build a request (covering multiple years)
req <- build_era5_request(xmin = xmn, xmax = xmx, 
                          ymin = ymn, ymax = ymx,
                          start_time = st_time,
                          end_time = en_time,
                          outfile_name = file_prefix)

Requests are stored in list format. Each request (divided by year) are stored as list objects within the master list:

str(req)

Obtaining data with a request

The next step is to execute the requests, by sending them to the CDS. Executing request_era5 will download .nc files for each year to the location defined by out_path. The filenames will have a prefix defined by outfile_name in build_era5_request. The request_era5 function will deal with lists containing multiple requests (i.e. those created by build_era5_request with temporal extents spanning multiple years). If users specify a duration longer than one year, the query is downloaded as a separate netCDF file for each year; users can combine files into one by specifying the parameter combine = TRUE in request_era5. At this stage, the user then waits for the netCDF to be downloaded to their machine, and will receive a confirmation message ("ERA5 netCDF file successfully downloaded") from the R console upon completion:

request_era5(request = req, uid = uid, out_path = file_path)

Common errors when for CDS requests can be found on the corresponding ECMWF Confluence page.

Extracting climate data for a spatial point using extract_clim() and extract_precip()

Once all .nc files are downloaded, the next step is to extract climate variables conforming to desired spatial and temporal extents from them. This can be done for a single point in space with extract_clim. The function outputs a dataframe where each row is a single observation in time and space and each column is a climatic variable. Note that this function uses the same start and end times defined prior to building the request.

By default, extract_clim applies an inverse distance weighting calculation (controlled by the parameter d_weight). This means that if the user requests data for a point that does not match the regular grid found in the ERA5 dataset (i.e., the centre point of each ERA5 grid cell), the four nearest neighbouring data points to the requested location will be used to create a weighted average of each climate variable, thereby providing a more likely estimate of location conditions.

Furthermore, extract_clim by default applies a diurnal temperature range correction to the data (when dtr_cor = TRUE and weighted according to dtr_cor_fac). The diurnal temperature ranges of ERA5 are artificially lower in grid cells classed as sea as opposed to land. It may thus be useful to apply a correction if estimates are required for a terrestrial location in predominantly marine grid cells. If applied, an internal function is evoked that uses the land/sea value in the downloaded NetCDF file to adjust temperature values by the factor provided using the formula $DTR_C=DTR[(1-p_l ) C_f+1]$ where $DTR_C$ is the corrected diurnal temperature range, $DTR$ is the diurnal temperature range in the ERA5 dataset, $p_l$ is the proportion of the grid cell that is land and $C_f$ is the correction factor. The default function input value is a correction based on calibration against the UK Met Office 1-km2 gridded dataset of daily maximum and minimum temperatures, itself calibrated and validated against a network of (on average) 1,203 weather stations distributed across the UK.

Given that runauto in the microclima package, as well as functions from the NicheMapR and microclimc microclimate modelling packages, all only accept date ranges within single years, you will need to create a separate data frame for each yearly block. For example, if your period of interest spans multiple years (and you have used the previous code to download multiple .nc files, run this next block of code for each year):

# list the path of the .nc file for a given year
my_nc <- paste0(getwd(), "/era5_-4_-2_49_51_2010.nc")

# for a single point (make sure it's within the bounds of your .nc file)
x <- -3.654600
y <- 50.640369

# gather all hourly variables
clim_point <- extract_clim(nc = my_nc, long = x, lat = y,
                            start_time = st_time, end_time = en_time)
head(clim_point)

In the same fashion, use extract_precip to acquire precipitation from your downloaded netCDF file, which also applies an inverse distance weighting calculation. By default, extract_precip sums up hourly ERA5 precipitation to the daily level, which is required for the aforementioned microclimate models. However users can instead receive hourly values by setting convert_daily = FALSE.

# gather daily precipitation (we specify to convert precipitation from hourly
# to daily, which is already the default behavior)
precip_point <- extract_precip(nc = my_nc, long = x, lat = y,
                                   start_time = st_time,  
                                   end_time = en_time,
                                   convert_daily = TRUE)

The dataframe created by extract_clim and vector created by extract_precip are ready to be used as inputs to the runauto function from the microclima package:

# create a 200 x 200 30 m spatial resoltuion DEM for location
r <- microclima::get_dem(lat = y, long = x, resolution = 30)

# change date format to fit runauto requirements
temps <- microclima::runauto(r = r, dstart = "26/02/2010",dfinish = "01/03/2010", 
                             hgt = 0.1, l = NA, x = NA, 
                             habitat = "Barren or sparsely vegetated",
                             hourlydata = clim_point, 
                             dailyprecip = precip_point, 
                             plot.progress= FALSE)

For use of climate data with other microclimate packages, such as microclimc and microclimf, you may need to reformat the climate data using microctools::hourlyncep_convert:

climdata <- hourlyncep_convert(climdata = clim_point, lat = y, long = x)

Extracting climate data for a spatial grid/array using extract_clima() and extract_precipa()

mcera5 also can extract ERA5 climate data simultaneously across a spatial grid using extract_clima() and extract_precipa(), to be provided to functions such as microclimf::modelina(). extract_clima() and extract_precipa() follow the same format as extract_clim() and extract_precip(), except require as input the boundaries of a spatial grid rather than a single spatial point. Both functions will return spatRaster layers corresponding to the climate variables.

extract_clima also includes the argument reformat, which when set to TRUE (the default value) will reformat climate data to be used for modeling with microclimf. Similarly, extract_precipa has the argument convert_daily to sum hourly preicipitation to daily precipitation, which is accepted by microclimf functions.

# list the path of the .nc file for a given year
my_nc <- paste0(getwd(), "/era5_-4_-2_49_51_2010.nc")

# 4 corners of a spatial grid (make sure it's within the bounds of your .nc file)
long_min <- -3.7
long_max <- -2.9
lat_min <- 50.1
lat_max <- 50.8

# gather all hourly variables
clim_grid <- extract_clima(nc, long_min, long_max, lat_min, lat_max, 
                          start_time = st_time,
                          end_time = en_time,
                          dtr_cor = TRUE, dtr_cor_fac = 1.285,
                          reformat = TRUE)

str(clim_grid)


precip_grid <- extract_precipa(nc, long_min, long_max, lat_min, lat_max, start_time,
                             end_time, convert_daily = TRUE)

str(precip_grid)

Acknowledgements

Thank you to Koen Hufkens, creator of the ecwmfr package for making the aquisition of ECWMF data through R possible.



dklinges9/mcera5 documentation built on Sept. 14, 2024, 5:49 p.m.