isRNRFA <- requireNamespace("rnrfa", quietly = TRUE) isZOO <- requireNamespace("zoo", quietly = TRUE) isHTTR <- requireNamespace("httr", quietly = TRUE) isUTF8 <- requireNamespace("utf8", quietly = TRUE) is1 <- (isHTTR & isUTF8) is2 <- (is1 & isRNRFA & isZOO)
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) #'arg' should be one of “default”, “cerulean”, “journal”, “flatly”, “darkly”, “readable”, “spacelab”, “united”, “cosmo”, “lumen”, “paper”, “sandstone”, “simplex”, “yeti”
winfapReader package contains functions to interact with the information on extremes of instantaneous river flow in the United Kingdom (UK) made available by the National River Flow Archive (NRFA).
These information underlie most flood risk estimation projects in the UK, which are typically carried out using the Flood Estimation Handbook (FEH) statistical method and their updates as implemented by the software WINFAP.
Consequently, the NRFA publishes and routinely updates files which are suitable to be read by WINFAP: the collection of these files is referred to as the peak flow dataset and can be found here. The winfapReader package allows the user to interact with three different file-types:
The .AM files which contain the Annual Maximum (AMAX) peaks: these correspond to the largest river flow event in any given water year (which runs from October 1st to September 30th)
The .CD3 files which contain the Catchment Descriptors: these correspond to a set of descriptors for the catchment upstream the gauging station and for the station itself.
The .PT files which contain the peaks over threshold: these correspond to all peaks which are larger than a given threshold. The threshold is fixed by the NRFA and it should be such that there is an average of 3 to 5 peaks over threshold (POT) events per water year. It has often been reported that the POT records in different stations have varying reliability: since most flood frequency estimation methods used in the UK rely on annual maxima the AMAX records go trough a higher scrutiny than then POT records. Users should treat the information about peaks over threshold with caution and thorough quality checks should be performed before analysing them.
winfapReader package allows you to read into your R session the
.PT files. Importantly it is aware of the typical structure of the files in which rejected annual maxima and missing period of records for the peaks over threshold are recorded, and merges this information with the flow records. This allows the user to have all useful information to decide which parts of the record to include in the analysis.
Recently the NRFA has developed an API which allows for a programmatic interaction with their datasets: the information about annual maxima, catchment descriptors and peaks over threshold can also be retrieved using this API.
Beside the information on extremes for flood frequency estimation the NRFA maintains and distributes daily river flow records and several other river flow related variables, such as catchment averaged rainfall: the
rnrfa package allows one to retrieve these information and more with its
rnrfa::get_ts functions (see more on this at the end of the vignette). The
winfapReader package focuses only on handling river flow extremes information and has two sets of functions:
read_pot functions read the information from the
.PT files once these have been downloaded into a local folder
get_pot functions get the information from the API: these functions therefore only work when an internet connection is available
It is difficult to showcase the use of the
read_* functions since these rely on the location of the WINFAP files within the users' working environment. Only the use of the
get_* function will be showcased below. For the annual maxima and peaks over threshold the two sets of functions give the same output.
library(winfapReader) ### the get_* functions only works once you are connected to the internet ### they also need one to have the library httr installed ### verify if you have the library with (!requireNamespace("httr", quietly = TRUE)) ### if FALSE install it with ### install.packages("httr")
get_amax function allows one to obtain information on annual maxima from the NRFA. The
read_amax function will produce the same output as the
get_amax function but is based on the locally saved files.
if(curl::has_internet()) amaxEx <- get_amax(c(42003,72014)) names(amaxEx); class(amaxEx) # let's look at only one of these a42003 <- amaxEx[["42003"]] ## what is the output head(a42003)
For each station the function outputs a
data.frame with information on the station number, the water year, the date in which the highest flow in the water year was recorded, the river flow value and the river stage value (when available) for all annual maxima recorded at a station.
Moreover it gives the information on whether the NRFA has deemed the maximum in a given year to be reliable or whether this has been rejected.
The function can query the API for more than one station at the time: in that case the output is a named list with each element corresponding to a station id.
get_pot function allows one to obtain information on peaks over threshold data from the NRFA. The
read_pot function will produce the same output as the
get_pot function but is based on the locally saved files.
if(curl::has_internet()) potEx <- get_pot(c(42003,72014)) names(potEx); class(potEx) # let's look at only one of these p42003 <- potEx[["42003"]] ## what is the output class(p42003); names(p42003)
For each station the function outputs a list with three elements:
data.framewith all the recorded exceedances above the threshold in the NRFA record. In particular information on the exceedance date, water year, peak flow and river stage are given.
head(p42003$tablePOT) ## notice: several events in the 1982 no events in 1983
data.framewith information on the percentage of valid record in each water year in the record. The potPercComplete column is derived by calculating the percentage of days which are not included in the POT Gaps or the POT rejected headers in the NRFA .PT files. The column potThreshold gives the information of the flow threshold used to extract the peaks for the station: this is a constant for each station.
dateRangegives the range of dates spanned by the POT record. This range might be wider than the range of the dates in the
tablePOTtable since it records the period in which the station was operational and no threshold exceedances occurred.
The function has an argument
getAmax which defaults to
getAmax = TRUE then information on the annual maxima is included in the
p42003withAmax <- get_pot(42003, getAmax = TRUE) head(p42003withAmax$WaterYearInfo, 10)
Notice that in the period when no POT records are available all POT related information are set to NA. On the other hand, the fact that the annual maximum in water year 1983 is below the threshold confirms that the fact that no POT record are present for that water year is related to low flows throughout the water year rather than a mistake in the POT record. Notice also that for several of the first years in the record the annual maxima values are rejected and the proportion of valid POT records (as shown by
potPercComplete) is null: the early part of the record for this station has been deemed by the NRFA to be unreliable and any analysis of this flow record should probably discard the information till water year 1995.
get_cd function allows the user to obtain information on the station (for example its location) and on the catchment upstream the station itself (for example the catchment area and the annual mean altitude for the catchment). More detail on several of the catchment descriptors can be found on the NRFA website and in the FEH.
The function gives a slightly different set of information than the
read_cd3 function, due to the difference in information made available by the NRFA API.
if(curl::has_internet()) cdEx <- get_cd(c(42003,72014)) names(cdEx); class(cdEx) # let's look at only one of these c42003 <- cdEx[["42003"]] ## what is the output class(c42003); names(c42003)
The function has an argument
fields which governs the amount of information obtained from the API. If
fields = "feh" (the default) only the basic information used in the FEH methods is output. If
fields="all" a data.frame with 104 columns is output. This contains several information about the station and the catchment, including data availability, land cover information and much more.
if(curl::has_internet()) cd42003all <- get_cd(42003, fields = "all") names(cd42003all)
rnrfa package provides a unique way to query several types of data from the NRFA. Information about extremes can also be retrieved using the
rnrfa package, although there are some differences in the output provided when the data of interest are the peaks over threshold records.
rnrfa::catalogue function allows one to pull the list of stations (and related metadata), falling within a given bounding box. The metadata retrieved by the function are similar to the ones derived from
winfapReader::get_cd. This function can be used to identify the stations in an area for which peak flow information can be obtained with
The code below for example identifies stations surrounding the city of Lancaster and then displays the annual maxima flow with red lines indicating Rejected flow values. Notice that you would need to have the
rnrfa package installed for the code below to work.
## Lancaster coordinates: 54.04, -2.8 ## let's look around the city rivLanc <- rnrfa::catalogue(bbox = list(lat_min = 54.04-0.2, lat_max = 54.04+0.2, lon_min = -2.8-0.2, lon_max = -2.8+0.2)) ### let's select stations which have been deemed to be suitable for pooling ### that's the highest quality flag for annual maxima table(rivLanc[,"feh-pooling"]) ### 5 stations are suitable for pooling rivLanc <- subset(rivLanc,subset = as.vector(rivLanc[,"feh-pooling",drop=TRUE])) rivLanc[,1:3] ### notice that rnrfa outputs a tibble and not a data.frame idLanc <- rivLanc[,"id",drop=TRUE] ## a vector of ids amaxLanc <- winfapReader::get_amax(idLanc) names(amaxLanc)
Now display the stations all together in a panel.
par(mfrow=c(2,3)) invisible( sapply(amaxLanc, function(x) with(x,plot(WaterYear,Flow, type="h",col=ifelse(Rejected,2,4), main = unique(Station)))))
The large events which have hit the area in 2015 can be seen in the flow series plots.
rnrfa package also allows to pull the annual maximum flow recorded at any station. To also obtain the information about the water year which the NRFA has deemed to be of poor quality and therefore rejected set the
full_info argument to
par(mfrow=c(1,1)) ### the annual maxima for 72014 from rnrfa maxflow72014 <- rnrfa::get_ts(72014, type = "amax-flow", full_info = TRUE) ### the annual maxima for 72014 from winfapReader xx <- amaxLanc[["72014"]][,c("Date","Flow","Rejected")] plot(xx[,"Flow"], maxflow72014[,"amax-flow"]); abline(0,1) ### same information which(xx$Rejected) ## but two years should be rejected which(maxflow72014$rejected == 1) ## same two years
To obtain the POT records in
type = "pot-flow": using the
full_info = TRUE option ensures that a rejected flag is given for the periods in which the POT records have been found to be unreliable or missing (see the NRFA website for more details on this). The
rejected flag is built using the same information used to build the
WaterYearInfo table in the
winfapReader::get_pot function. The additional information provided in
WaterYearInfo is useful to identify the years in which no POT record is found because the records are missing/unreliable and not because the threshold was never exceeded.
par(mfrow=c(1,1)) # the pot records for 75001 from rnrfa pot75001 <- rnrfa::get_ts(75001, type = "pot-flow", full_info = TRUE) pot75001[9:12,] # using winfapReader p75001 <- get_pot(75001) p75001$tablePOT[9:12,] # the same peaks are identified p75001$WaterYearInfo[1:5,] ### but notice that 1975 had a low proportion missing records # the lack of data in 1975 is due to all flow being low
The two packages can be used together to retrieve different type of information about river flow: in the example below daily gauged flow for the Conder at Galgate (station 72014) is displayed together with annual maxima (which are extracted from the instantaneous river flow). The latter are typically larger and can be seen to start further in the past than the daily flow data.
### get daily data from NRFA daily72014 <- rnrfa::get_ts(72014, type = "gdf") ## make daily data into data.frame daily72014 <- data.frame(Day = zoo::index(daily72014), DFlow = as.vector(daily72014)) plot(xx[,c("Date","Flow")], col = ifelse(xx$Rejected, 2, 4), pch = 4, ylim =c(0,1.05*max(xx$Flow))) title(main = "The Conder at Galgate") points(daily72014, type="l")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.