Introduction
In bdrc: Bayesian Discharge Rating Curves

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align = 'center',
  prompt = TRUE
)

A discharge rating curve is a model that describes the relationship between water elevation and discharge in a river. The rating curve is estimated from paired observations of water elevation and discharge and it is used to predict discharge for a given water elevation. This is the main practical usage of rating curves, as water elevation is substantially easier to directly observe than discharge. Four different discharge rating curve models are implemented in this R package using a Bayesian hierarchical model:

plm0() - Power-law model with a constant error variance (hence the 0). This is a Bayesian hierarchical implementation of the most commonly used discharge rating curve model in hydrological practice.

plm() - Power-law model with error variance that varies with water elevation.

gplm0() - Generalized power-law model with a constant error variance (hence the 0). The generalized power law is introduced in Hrafnkelsson et al. (2022).

gplm() - Generalized power-law model with error variance that varies with water elevation. The generalized power law is introduced in Hrafnkelsson et al. (2022).

For further details about the different models, see Hrafnkelsson et al. (2022). The models differ in their complexity, gplm being the most flexible and complex model. We will focus on the use of gplm throughout this introduction vignette and explore the different ways to fit the gplm and visualize its output. However, the API of the functions for the other three models are completely identical so this vignette also helps users to run those models.

library(bdrc)
set.seed(1) #set seed for reproducibility

We will use a dataset from a stream gauging station in Sweden, called Krokfors, that comes with the package:

data(krokfors)
krokfors

Fitting a discharge rating curve

It is very simple to fit a discharge rating curve with the bdrc package. All you need are two mandatory input arguments, formula and data. The formula is of the form y~x, where y is the discharge in square meters per second (m$^3/$s), and x is the water elevation in meters (m). It is very important that the data is in the correct units! The data argument must be a data.frame including x and y as column names. In our case, the Krokfors data has the discharge and water elevation measurements stored in columns named Q and W, respectively. We are ready to fit a discharge rating curve using the gplm function:

gplm.fit <- gplm(Q~W,data=krokfors,parallel=TRUE,num_cores=2) # parallel=TRUE by default and by default, the number of cores is detected on the machine

The gplm function returns an object of class "gplm" which we can summarize and visualize using familiar functions such as

summary(gplm.fit)

and

plot(gplm.fit)

In the next section, we will dive deeper into visualizing the "gplm" object.

Visualizing posterior distributions of different parameters

The bdrc package provides several tools to visualize the results from model objects which can give insight into the physical properties of the river at hand. For instance, the hyperparameter $c$ corresponds to the water elevation of zero discharge. To visualize the posterior of $c$, we can write

plot(gplm.fit,type='histogram',param='c')

Technically, instead of inferring $c$ directly, $h_{min}-c$ is inferred, where $h_{min}$ is the lowest water elevation value in the data. Since the parameter $h_{min}-c$ is strictly positive, a transformation $\zeta=log(h_{min}-c)$ is used for the Bayesian inference so that it has support on the real line. To plot the transformed posterior we write

plot(gplm.fit,type='histogram',param='c',transformed=TRUE)

The param argument can also be a vector containing multiple parameter names. For example, to visualize the posterior distributions of the parameters $a$ and $c$, we can write

plot(gplm.fit,type='histogram',param=c('a','c'))

There is a shorthand to visualize the hyperparameters all at once

plot(gplm.fit,type='histogram',param='hyperparameters')

Similarly, writing 'latent_parameters' plots the latent parameters in one plot. To plot the hyperparameters transformed on the same scale as in the Bayesian inference, we write

plot(gplm.fit,type='histogram',param='hyperparameters',transformed=TRUE)

Finally, we can visualize the components of the model that are allowed (depending on the model) to vary with water elevation, that is, the power-law exponent, $f(h)$, and the standard deviation of the error terms at the response level, $\sigma_{\varepsilon}(h)$. Both gplm0 and gplm generalize the power-law exponent by modeling it as a sum of a constant term, $b$, and Gaussian process, $\beta(h)$, namely $f(h)=b+\beta(h)$, where $\beta(h)$ is assumed to be twice differentiable with mean zero. On the other hand, plm and plm0 both model the power-law exponent as a constant by setting $\beta(h)=0$, which gives $f(h)=b$. We can plot the inferred power-law exponent with

plot(gplm.fit,type='f')

Both plm and gplm model the standard deviation of the error terms at the response level, $\sigma_{\varepsilon}(h)$, as a function of water elevation, using B-splines basis functions, while plm0 and gplm0 model the standard deviation as a constant. We can plot the inferred standard deviation by writing

plot(gplm.fit,type='sigma_eps')

To get a visual summary of your model, the 'panel' option in the plot type is useful:

plot(gplm.fit,type='panel',transformed=TRUE)

Assessing model fitness and convergence

The package has several functions for convergence diagnostics of a bdrc model, most notably the residual plot, trace plots, autocorrelation plots, and Gelman-Rubin diagnostic plots. The log-residuals can be plotted with

plot(gplm.fit,type='residuals')

The log-residuals are calculated by subtracting the posterior estimate (median) of the log-discharge, $log(\hat{Q})$, from the observed log-discharge, $log(Q)$. Additionally, the plot includes the 95% predictive intervals of log(Q) (- -) and 95% credible intervals for the expected value of log(Q) (—), the latter reflecting the rating curve uncertainty.

plot(gplm.fit,type='trace',param='c',transformed=TRUE)

To plot a trace plot for all the transformed hyperparameters, we write

plot(gplm.fit,type='trace',param='hyperparameters',transformed=TRUE)

To assess the mixing and convergence of the MCMC chains for each parameter, you can visualize the Gelman-Rubin statistic, $\hat{R}$, as presented by Gelman and Rubin (1992) with:

plot(gplm.fit,type='r_hat')

And finally, autocorrelation of parameters can be assessed with

plot(gplm.fit,type='autocorrelation')

Customization of models

There are ways to further customize the gplm function. In some instances, the parameter of zero discharge, $c$, is known, and you might want to fix the model parameter to the known value. In addition, you might want to extrapolate the rating curve to higher water elevation values by adjusting the maximum water elevation. Assume 7.65 m is the known value of $c$ and you want to calculate the rating curve for water elevation values up to 10 m, then your function call would look like this

gplm.fit.known_c <- gplm(Q~W,krokfors,c_param=7.65,h_max=10,parallel=FALSE)

Prediction for an equally spaced grid of water elevations

To get rating curve predictions for an equally spaced grid of water elevation values, you can use the predict function. Note that only values in the range from $c$ and h_max are accepted, as that is the range in which the Bayesian inference was performed

h_grid <- seq(8,9,by=0.01)
rating_curve_h_grid <- predict(gplm.fit,newdata=h_grid)
print(rating_curve_h_grid)

References

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences, Statistical Science, 7(4), 457–472.

Hrafnkelsson, B., Sigurdarson, H., and Gardarsson, S. M. (2022). Generalization of the power-law rating curve using hydrodynamic theory and Bayesian hierarchical modeling, Environmetrics, 33(2):e2711.