In pkuhnert/LRE: Loads Regression Estimator

knitr::opts_chunk$set(echo = TRUE)
library(LRE)  
library(ggplot2)
library(gratia)
library(patchwork)

Overview

LRE is a tool for quantifying loads with uncertainties (Kuhnert et al. 2012). The tool uses a generalized additive modelling approach to relate concentration to flow and other terms in the model that mimic hydrological relationships of the complex river system being modelled.

The LRE methodology comprises four steps:

Estimation steps for flow that ensures flow values occur at regular intervals (e.g. hourly, daily or monthly);
Estimation steps for concentration, where a generalised additive model (GAM) is proposed for estimating concentration using covariates that account for seasonality and important hydrological processes of complex river systems. In some cases, a random effect term will need to be considered;
The estimation of the load that includes a bias correction, and
The quantification of the standard error, which accounts for transect (spatial) error in addition to measurement error.

Analysis of the Inkerman Bridge site in the Burdekin

The Inkerman Bridge site in the Burdekin catchment drains water into the Great Barrier Reef (GBR) lagoon from an area 300,000 square kilometres in area. Courtesy from the Queensland State Government (DSITI) we have flow and total suspended sediment (TSS) data. Flow data consists of daily measurements (cumecs) spanning December 1973 through to June 2015. Concentration data consists of daily measurements (mg/L) spanning January 2006 through to June 2016.

The LRE package is used to

Read in flow and concentration data;
Create a 'regularized' dataset;
Fit a GAM to predict concentration given flow; and
Calculate the load.

Below are a series of steps for calculating the load in R using the LRE package.

Step 1: Read in the data

There are a couple of different ways of getting your data into R for analysing with the LRE package.

The function ReadInData reads in a concentration and flow dataset as two separate .csv datasets. The name of the csv files should be common to both files with the extension _C and _Q referencing the concentration and flow files respectively. An example below reads in 'BurdRdaily' which is housed in the extdata directory within the LRE package.

# Version 1: Read In Burdekin data from an external file
burd <- ReadInData(dirnm = system.file("extdata", package = "LRE"), filenm = "/BurdRdaily", Cnames = "TSS",
                   format = "%Y-%m-%d")

An alternative version reads in the flow and concentration data using the ReadInDataFromR function and assumes that both flow and concentration objects are R bojects stored with the R package.

# Version 2: burdRC and burdRQ are already stored as part of the package
burd <- ReadInDataFromR(x.C = burdRC, x.Q = burdRQ)
plot(burd)
summary(burd)
ghist(burd)

Step 2: Regularization

Before fitting the Generalized Additive Model or GAM we need to create a 'regularised' dataset, whereby a dataset of flow is created for a regular time interval e.g. daily. In addition to flow, additional terms that mimic the hydrological relationships inherent in many flow systems are also created at the daily time step. The inputs to the CreateData function are

the flow and concentration data frames produced from Step 1 above;
the date range for modelling (date.rangeM) and prediction (date.rangeP) and whether the dates have an hour component;
the sampling unit (hour or day);
the year type for summaries (water year (WY) or financial year (FY));
the percentile used to calculate a flush (e.g. 0.9);
whether infilling or regularization is to be performed (none, smoothers (ss) or quantile random forests (qrforest))

Below is an example where regularization was not required as flow data was available at the daily time step.

date.rangeM <- c("1973-12-02", "2015-06-30")
date.rangeP <- c("1973-12-02", "2015-06-30")
loaddata <- CreateData(Q = burd$Q, Conc = burd$Conc,
                       date.range = list(model = date.rangeM, pred = date.rangeP, hour = FALSE),
                       samp.unit = "day", Ytype = "WY", Qflush = 0.9,
                       Reg = list(type = "none", rainfall = NULL, date = NULL))

We can then produce some exploratory plots to investigate the hydrological processes within the system.

regplots <- plot(loaddata, Type = "WY")
names(regplots) # terms we can plot

# Rising Falling Limb
regplots$p_RiseFallLimb

# Distributional Summary
regplots$p_DistSum

# Flow and Concentration summary
regplots$p_CQsum

# Smooth parameters
regplots$p_SmoothParms

Plots can also be saved using the ggsave function, which is part of the ggplot2 package. Note, all plots are produced using the ggplot2 package in R.

# Save output as a pdf file
ggsave("p_SmoothParms.pdf", regplots$p_SmoothParms)

We can also produce some summaries of the regularized data. These summaries include a table outlining the bias in concentration and flow samples, broken up by water type (either water year (WY) or financial year (FY)) and a distributional summary of flow sampling to indicate the nature of the flow sampling undertaken.

summary(loaddata)

Step 3: Fit the GAM

A generalized additive model (GAM) can be fit to a log-transformed regularized dataset. The FitModel function fits a GAM using the mgcv package. While the gam function within mgcv can be used to fit the model directly, the FitModel function makes the implementation easier as the user just needs to specify what terms can be fit in the model (e.g. flow, seasonal, rising-falling limb (RFlimb), moving average (MA) terms, a trend term and autocorrelation through an AR1 process. Note, care should be taken when fitting trend terms in models, particularly their interpretation. The correlation term is fit using a generalized additive mixed model using the gamm function in the mgcv package and may take a considerable time to fit if correlation = TRUE.

Tip 1: When attempting to fit a GAM using the FitModel function, start with setting all parameters (apart from the trend and correlation arguments) to TRUE. Investigate the significance of the terms through the p-value and omit any terms, one by one, that are not significant starting with the terms that have the highest p-value. This approach is otherwise known as backward elimination. The model below is a result of a backward elimination process.

mod1 <- FitModel(x = loaddata$CQ, parms = list(flow = "quadratic", seasonal = TRUE,
                                               RFlimb = FALSE,
                                               MA = c(MA1day = FALSE, MA2days = FALSE,
                                                      MAweek = TRUE,  MAmonth = TRUE,
                                                      MA6months = TRUE, MA12months = TRUE),
                                               trend = FALSE, correlation = FALSE))
summary(mod1)
anova(mod1)
print(class(mod1))

Step 4: Diagnostics

It is useful to check the fit and validity of your model through some diagnostic plots. The function diagnostic simplifies this process by producing two sets of figures. The first set (pD) produces 4 figures that allow the user to examine the fit of their model. The plots (from left to right) show a plot of

the residuals versus fitted;
histogram of residuals;
a quantile-quantile or Q-Q plot of the residuals; and
a plot of the predicted versus observed with a line of best fit.

The second figure examines the auto-correlation function of the residuals (pacf). In this figure we hope to see a dampening out of correlations over at past lags (i.e. spikes in correlations at lags where the spikes fall within the 95% confidence intervals). In the figure below we observe some correlation ~0.3 at a lag of 1 but this diminishes at lags 2 and beyond.

mod1D <- diagnostic(mod1)
names(mod1D)
# Diagnostic Plots
mod1D$pD
# ACF of residuals
mod1D$pacf

Step 5: Interpretation of the Model

It is important to gain an understanding of the terms fit in the model and how they relate to concentration. We can obtain a marginal representation of each term in the model by plotting the output from the fit of the model housed in the R object mod1I. As we only fit MA terms and a seasonal term, we can only plot these terms.

library(gratia)    # exploring smoothers
library(patchwork) # package for arranging plots
p1 <- draw(mod1, select = "s(month)") 
p2 <- draw(mod1, select = "s(MAweek)") 
p3 <- draw(mod1, select = "s(MAmonth)") 
p4 <- draw(mod1, select = "s(MA6months)") 
p5 <- draw(mod1, select = "s(MA12months)") 
p1 + p2 + p3 + plot_layout(ncol = 3)
p4 + p5 + plot_layout(ncol = 2)

We can also investigate the predicted concentration time series together with the regularized flow through the following plot.

#Exploring predictions from model

mod1_expl <- plot(mod1, Qreg = loaddata$Qreg, data = loaddata$CQ) 
names(mod1_expl)
mod1_expl$pConcInt

Sometimes it is also useful to plot the predicted concentrations interactively and explore the predictions at different parts of the time series by zooming in and zooming out using your mouse. Here is an example using the ggplotly function which is part of the plotly package in R. Note, the code below is not run for this vignette.

# Investigate predicted concentration using ggplotly
# 
library(plotly)
ggplotly(mod1I$pConc)

Step 6: Predicting the Load with uncertainties

We can form estimates of the load with uncertainties using the predictL function in R. Using this function we can output two types of predictions. The first is the loads (pL1), while the second set of estimates are the flow weighted concentrations (pFWC1). Each of these estimates are accompanyied by confidence intervals, where the width of the confidence interval is determined by the pvalue argument that the user sets. By default, the error in flow is assumed to be zero but can be set through the flow.error argument. Users can enter a coefficient of variation that reflects the measurement error (me) and/or the spatial error or positioning of the flow gauge (ce). Load estimates can be produced at the annual scale (type = annual) or daily scale (type = daily).

predLoad <- predictL(object = mod1, objfix = mod1$gam, x = loaddata, flow.error = list(me = 0, ce = 0),
                  samp.unit = "day", pvalue = 0.2)

mod1pL <- plot(predLoad$loadest, type = "annual", Conc = "TSS", scale = "Mt")
names(mod1pL)

# Annual loads
mod1pL$pL1

# Flow weighted concentrations
mod1pL$pFWC1

An interactive plot of the flow weighted concentrations can be called using the ggplotly function. Note, this is not run for the vignette.

# interactive flow weighted concentrations
ggplotly(mod1pL$pFWC1)

Step 7: Storing Results

Load estimates can be accessed from the predLoad object as follows. These can then be written to file using the write.csv function in R.

results <- predLoad$loadest
names(results)

Infilling Flow

When there are gaps in the flow record, we need a methodology for infilling flow to form a regularized dataset. In LRE, there are two approaches for achieving this. The first uses a smoothing spline approach using the gam function in the mgcv package. The second approach uses quantile Random Forests to fit a model between flow and rainfall at different lags to develop a predictive model that is used to predict flow at records with missing data.

To illustrate the two approaches, we artificially exclude 100 consecutive records spanning the date ranges: 2004-01-13 to 2004-04-22. This dataset is called burdRNA.

Below is an example of infilling using smoothing splines.

date.rangeM <- c("2011-01-01", "2015-06-30")
date.rangeP <- c("2011-01-01", "2015-06-30")
loaddata <- CreateData(Q = burdRNA$Q, Conc = burdRNA$Conc,
                       date.range = list(model = date.rangeM, pred = date.rangeP, hour = FALSE),
                       samp.unit = "day", Ytype = "WY", Qflush = 0.9,
                       Reg = list(type = "ss"))

Here is an example of infilling using quantile Random Forests.

date.rangeM <- c("2011-01-01", "2015-06-30")
date.rangeP <- c("2011-01-01", "2015-06-30")
loaddata <- CreateData(Q = burdRNA$Q, Conc = burdRNA$Conc,
                       date.range = list(model = date.rangeM, pred = date.rangeP, hour = FALSE),
                       samp.unit = "day", Ytype = "WY", Qflush = 0.9,
                       Reg = list(type = "qrforest", rainfall = burd_rain$Rainfall,
                                  date = burd_rain$Date))

Reference

Kuhnert, P.M., Henderson, B.L., Lewis, S.E., Bainbridge, Z.T., Wilkinson, S.N. and Brodie, J.E. (2012) Quantifying total suspended sediment export from the Burdekin River catchment using the loads regression estimator tool, Water Resources Research, 48, W04533,doi:10.1029/2011WR011080.

pkuhnert/LRE documentation built on June 9, 2025, 9:49 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com