Bayesian Markov Chain Monte Carlo algorithm for flood frequency analysis with historical and other information. The user can choose between a local and a regional analysis.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | ```
BayesianMCMC (xcont, xhist=NA, infhist=NA, suphist=NA,
nbans=NA, seuil=NA, nbpas=1000, nbchaines=3,
confint=c(0.05, 0.95), dist="GEV",
apriori=function(...){1},
parameters0=NA, varparameters0=NA)
BayesianMCMCcont (x, nbpas=NA)
BayesianMCMCreg (xcont, scont, xhist=NA, infhist=NA, suphist=NA, shist=NA,
nbans=NA, seuil=NA, nbpas=1000, nbchaines=3,
confint=c(0.05, 0.95), dist="GEV",
apriori=function(...){1},
parameters0=NA, varparameters0=NA)
BayesianMCMCregcont (x, nbpas=NA)
plotBayesianMCMCreg_surf (x, surf, ask=FALSE, ...)
## S3 method for class 'BayesianMCMC'
plot(x, which=1, ask=FALSE, ...)
## S3 method for class 'BayesianMCMC'
print(x, ...)
## S3 method for class 'BayesianMCMCreg'
plot(x, which=1, ask=FALSE, ...)
## S3 method for class 'BayesianMCMCreg'
print(x, ...)
``` |

`x` |
object of class |

`xcont` |
vector of systematic data |

`scont` |
vector of upstream catchment surfaces of systematic data |

`xhist` |
vector of historical data and/or extreme discharges at ungauged sites |

`infhist` |
vector of inferior limit for historical data and/or extreme discharges at ungauged sites |

`suphist` |
vector of superior limit for historical data and/or extreme discharges at ungauged sites |

`shist` |
vector of upstream catchment surfaces of extreme discharges at ungauged sites |

`nbans` |
period (in years) over which every threshold has not been exceeded except for the historical data and/or extreme discharges at ungauged sites. If several values of xhist for a same threshold, put the number of years associated to the threshold on the first row, then put 0 (see examples) |

`seuil` |
threshold not exceeded in the historical period except for the historical data and/or extreme discharges at ungauged sites (several thresholds allowed). |

`nbpas` |
number of iterations for the MCMC algorithm |

`nbchaines` |
number of chains for the MCMC algorithm |

`confint` |
confidence limits for the flood quantiles |

`dist` |
distribution: normal |

`apriori` |
function of the parameters of the model ‘proportional to’ their a-priori guessed distribution. The default fuction returns always 1, i.e. there is no a-priori information |

`parameters0` |
initial values of the parameters for the MCMC algorithm |

`varparameters0` |
initial values of the parameter variances for the MCMC algorithm |

`which` |
a number of a vector of numbers that defines the graph to plot (see details) |

`ask` |
if TRUE, the interactive mode is run |

`surf` |
a particular surface (number or vector), not necessarily being a surface included in the scont or shist vectors |

`...` |
other arguments |

**Supported cases**

These functions are taking 4 cases into account, depending on the type of data provided: - Using only the systematic data: xcont provided, xhist=NA, infhist=NA and suphist=NA - Using censored information, historical flood known: xcont and xhist provided, infhist=NA and suphist=NA - Using censored information, historical flood unknown precisely but its lower limit known: xcont and infhist provided, xhist=NA and suphist=NA - Taking into account flood estimation intervals: infhist and suphist (respectively lower and upper limits) provided, xcont provided, xhist=NA - Please note that every other case is NOT supported. For example, you can't have some historical flood values perfectly known as well as some other for which you only know a lower limit or an interval.

Regarding the perception thresholds: - By definition, the number of exceedances of each perception threshold within its application period has to be known precisely, and all the floods exceeding the threshold have to be included in xhist (or infhist or suphist). - Several thresholds are allowed. - It is possible to include in xhist (or infhist or suphist) historical values that do not exceed the associated perception threshold. - If for one or several thresholds you only know that this or these threshold have never been exceeded and no more information is available on floods that did not exceed the threshold(s), this case is also supported. In this case, put for the historical flood corresponding to the threshold xhist=-1 (or infhist=-1 or infhist=suphist=-1).

**Bayesian inference**

Bayesian inference uses a numerical estimate of the degree of belief in a hypothesis before evidence has been observed and calculates a numerical estimate of the degree of belief in the hypothesis after evidence has been observed.
The name ‘Bayesian’ comes from the frequent use of Bayes' theorem in the inference process.
In our case the problem is: which is the probability that a frequency function *P* (of type defined in `dist`

) has parameters *T*, given that we have observed the realizations *D* (defined in `xcont`

, `xhist`

, `infhist`

, `suphist`

, `nbans`

, `seuil`

).
The Bayes' theorem writes

*P(T|D) = P(D|T)*P(T)/P(D)*

where
*P(T|D)* is the conditional probability of *T*, given *D* (it is also called the posterior probability because it is derived from or depends upon the specified value of *D*) and is the result we are interested in;
*P(T)* is the prior probability or marginal probability of *T* (‘prior’ in the sense that it does not take into account any information about *D*), and can be given using the input `apriori`

(it can be used to account for causal information);
*P(D|T)* is the conditional probability of *D* given *T* and it is defined choosing `dist`

and depending on the availability of historical data;
*P(D)* is the prior or marginal probability of *D*, and acts as a normalizing constant.
Intuitively, Bayes' theorem in this form describes the way in which one's beliefs about observing *T* are updated by having observed *D*.

Since complex models cannot be processed in closed form by a Bayesian analysis, namely because of the extreme difficulty in computing the normalization factor *P(D)*, simulation-based Monte Carlo techniques as the MCMC approaches are used.

**MCMC Metropolis algorithm**

Markov chain Monte Carlo (MCMC) methods (which include random walk Monte Carlo methods), are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample from the desired distribution. The quality of the sample improves as a function of the number of steps.

The MCMC is performed here through a simple Metropolis algorithm, i.e. a Metropolis-Hastings algorithm with symmetric proposal density.
The Metropolis-Hastings algorithm can draw samples from any probability distribution *P(x)*, requiring only that a function proportional to the density can be calculated at *x*.
In Bayesian applications, the normalization factor is often extremely difficult to compute, so the ability to generate a sample without knowing this constant of proportionality is a major virtue of the algorithm.
The algorithm generates a Markov chain in which each state *xt+1* depends only on the previous state *xt*.
The algorithm uses a Gaussian proposal density *N(xt, Sx)*, which depends on the current state *xt*, to generate a new proposed sample *x'*.
This proposal is accepted as the next value *xt + 1 = x'* if *a* drawn from *U(0,1)* satisfies

*a < P(x')/P(xt)*

If the proposal is not accepted, then the current value of *x* is retained (*xt+1=xt*).

The Markov chain is started from a random initial value *x0* and the algorithm is run for many iterations until this initial state is forgotten.
These samples, which are discarded, are known as burn-in.
The remaining set of accepted values of *x* represent a sample from the distribution *P(x)*.
As a Gaussian proposal density (or a lognormal one for definite-positive parameters) is used, the variance parameter *Sx^2* has to be tuned during the burn-in period.
This is done by calculating the acceptance rate, which is the fraction of proposed samples that is accepted in a window of the last *N* samples.
The desired acceptance rate depends on the target distribution, however it has been shown theoretically that the ideal acceptance rate for a one dimensional Gaussian distribution is approx 50%, decreasing to approx 23% for an N-dimensional Gaussian target distribution.
If *Sx^2* is too small the chain will mix slowly (i.e., the acceptance rate will be too high, so the sampling will move around the space slowly and converge slowly to *P(x)*).
If *Sx^2* is too large the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density.
The desired acceptance rate is fixed here to 34%.

The MCMC algorithm is based on a code developed by Eric Gaume on Scilab. It is still unstable and not all the distributions have been tested.

`BayesianMCMC`

and `BayesianMCMCcont`

(which just continues the simulations of `BayesianMCMC`

for local analyses and `BayesianMCMCreg`

and `BayesianMCMCregcont`

for regional analyses return the following values:

`BayesianMCMCreg`

and `BayesianMCMCregcont`

(which just continues the simulations of `BayesianMCMC`

starting from its output) return the following values:

`parameters`

matrix (nbpas)x(nbchaines) with the simulated sets of parameters with the MCMC algorithm;

`parametersML`

set of parameters correspondent to the maximum likelihood;

`returnperiods`

return periods for which `quantilesML`

and `intervals`

are calculated;

`quantilesML`

quantiles correspondent to `returnperiods`

for the distribution whose parameters are `parametersML`

;

`logML`

maximum log-likelihood;

`intervals`

confidence intervals for the quantiles `quantilesML`

for limits `confint`

;

`varparameters`

matrix (nbpas)x(nbchaines)x(number of parameters) with the simulated variances for the MCMC algorithm;

`vraisdist`

likelihoods for the sets `parameters`

;

`propsaut`

vector showing the evolution of the acceptance rate during the Bayesian MCMC fitting;

`plot.BayesianMCMC`

and `plot.BayesianMCMCreg`

(for a normalized surface of 1 km2) plot the following figures:

`1`

data as plotting position (the Cunanne plotting position *a = 0.4* is used), fitted distribution (maximum likelihood) and confidence intervals;

`2`

diagnostic plot of the MCMC simulation (parameters);

`3`

diagnostic plot of the MCMC simulation (likelihood and MCMC acceptance rate);

`4`

posterior distribution of parameters obtained with the MCMC simulation (cloud plots);

`5`

a-priori distribution of parameters (contour plots);

`plotBayesianMCMCreg_surf`

plots the same plot as the first one given by `plot.BayesianMCMCreg`

but for each surface in argument, as well as its mean as a function of the surfaces;

For information on the package and the Author, and for all the references, see `nsRFA`

.

Eric Gaume, Alberto Viglione, Jose Luis Salinas, Olivier Payrastre, Chi Cong N'guyen, Karine Halbert

.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | ```
set.seed(2988)
serie <- rand.GEV(120, xi=40, alfa=20, k=-0.4)
serie100 <- serie[1:100]
serie100[serie100 < 250] <- NA
serie20 <- serie[101:120]
serie <- c(serie100, serie20)
plot(serie, type="h", ylim=c(0, 600), xlab="",
ylab="Annual flood peaks [m3/s]", lwd=3)
abline(h=0)
points(serie100, col=2)
## Not run:
# Using only sistematic data
only_sist <- BayesianMCMC (xcont=serie20, nbpas=5000, nbchaines=3, varparameters0=c(70, 20, 0.5),
confint=c(0.05, 0.95), dist="GEV")
plot(only_sist, which=c(1:3), ask=TRUE, ylim=c(1,600))
only_sist <- BayesianMCMCcont(only_sist)
plot(only_sist, which=c(1:3), ask=TRUE, ylim=c(1,600))
only_sist <- BayesianMCMCcont(only_sist)
plot(only_sist, which=c(1:3), ask=TRUE, ylim=c(1,600))
# Adding the information that the threshold 250 m3/s was exceeded
# 3 times in the past 100 years
with_hist_thresh <- BayesianMCMC (xcont=serie20, infhist=rep(250,3),
nbans=100, seuil=250,
nbpas=5000, nbchaines=3,
confint=c(0.05, 0.95), dist="GEV")
plot(with_hist_thresh, which=c(1:3), ask=TRUE, ylim=c(1,600))
# Assuming that the 3 historical events are known with high uncertainty
with_hist_limits <- BayesianMCMC (xcont=serie20,
infhist=c(320,320,250),
suphist=c(360,400,270),
nbans=100, seuil=250,
nbpas=5000, nbchaines=3,
confint=c(0.05, 0.95), dist="GEV")
plot(with_hist_limits, which=c(1:3), ask=TRUE, ylim=c(1,600))
# Assuming that the 3 historical events are perfectly known
with_hist_known <- BayesianMCMC (xcont=serie20, xhist=serie100[!is.na(serie100)],
nbans=100, seuil=250,
nbpas=5000, nbchaines=3,
confint=c(0.05, 0.95), dist="GEV")
plot(with_hist_known, which=c(1:3), ask=TRUE, ylim=c(1,600))
# Perception threshold without available information on floods
without_info <- BayesianMCMC (xcont=serie20, xhist=-1,
nbans=100, seuil=2400,
nbpas=5000, nbchaines=3,
confint=c(0.05, 0.95), dist="GEV")
plot(without_info, which=c(1:3), ask=TRUE, ylim=c(1,600))
# Using one reasonable a-priori distribution
fNORM3 <- function (x) {
# x = vector of values
# mu = vector of means
mu = c(44, 26, -0.40)
# CM = covariance matrix
CM = matrix(c(13, 7.8, -0.055,
7.8, 15, -0.42,
-0.055, -0.42, 0.056), nrow=3, ncol=3)
CMm1 <- solve(CM)
term2 <- exp(-((x - mu) %*% CMm1 %*% (x - mu))/2)
term1 <- 1/(2*pi)^(3/2)/sqrt(det(CM))
term1*term2
}
with_hist_known2 <- BayesianMCMC (xcont=serie20, xhist=serie100[!is.na(serie100)],
nbans=100, seuil=250,
nbpas=5000, nbchaines=3, apriori=fNORM3,
confint=c(0.05, 0.95), dist="GEV")
plot(with_hist_known2, 5)
plot(with_hist_known2, 4)
plot(with_hist_known, 4)
plot(with_hist_known)
plot(with_hist_known2)
# Using one non-reasonable a-priori distribution
fNORM3 <- function (x) {
# x = vector of values
# mu = vector of means
mu = c(30, 50, -0.10)
# CM = covariance matrix
CM = matrix(c(13, 7.8, -0.055,
7.8, 15, -0.42,
-0.055, -0.42, 0.056), nrow=3, ncol=3)
CMm1 <- solve(CM)
term2 <- exp(-((x - mu) %*% CMm1 %*% (x - mu))/2)
term2
}
with_hist_known3 <- BayesianMCMC (xcont=serie20, xhist=serie100[!is.na(serie100)],
nbans=100, seuil=250,
nbpas=5000, nbchaines=3, apriori=fNORM3,
confint=c(0.05, 0.95), dist="GEV")
plot(with_hist_known3, 5)
plot(with_hist_known3, 4)
plot(with_hist_known, 4)
plot(with_hist_known)
plot(with_hist_known3)
## End(Not run)
## Not run:
# Assuming that the historical events are perfectly known and there are 4 different thresholds
# The data file is presenting this way:
# xhist nbans seuil
# 6000 55 6000
# 7400 28 7250
# 6350 8 3050
# 4000 0 3050
# 4550 0 3050
# 3950 0 3050
# 7550 58 2400
# 4650 0 2400
# 3950 0 2400
## Warning: nbans and seuil should have the same length as xhist.
# So when a threshold is exceeded several times, replicate it as many times it is exceeded
# and part the number of years of exceedance into the number of times of exceedance
# (the way you part the nbans vector is not important, what is important is that you have
# length(nbans)=length(xhist) and the total of years for one same threshold equals the number
# of years covered by the perception threshold)
xhist_thres <- c(6000, 7400, 6350, 4000, 4550, 3950, 7550, 4650, 3950)
seuil_thres <- c(6000, 7250, rep(3050, 4), rep(2400, 3))
nbans_thres <- c(55, 28, 8, 0, 0, 0, 58, 0, 0)
# The threshold at 6000 has been exceeded for 55 years, the one at 7250 for 28 years,
# the one at 3050 for 8 years and the one at 2400 for 58 years
with_hist_known_several_thresholds <- BayesianMCMC (xcont=serie20,
xhist=xhist_thres,
nbans=nbans_thres, seuil=seuil_thres,
nbpas=5000, nbchaines=3,
confint=c(0.05, 0.95), dist="GEV",
varparameters0=c(NA, NA, 0.5))
plot(with_hist_known_several_thresholds, which=c(1:3), ask=TRUE)
## REGIONAL:
# Regional analysis, assuming that the 3 historical events are perfectly known and
# there are 2 perception thresholds
regional_with_hist_known <- BayesianMCMCreg (xcont=serie20,
scont=c(rep(507,9),rep(2240,11)),
xhist=serie100[!is.na(serie100)],
shist=c(495, 495, 87),
nbans=c(100, 0, 50), seuil=c(312, 312, 221),
nbpas=5000, nbchaines=3,
confint=c(0.05, 0.95), dist="GEV",
varparameters0=c(NA, NA, NA, 0.5))
plot(regional_with_hist_known, which=1:3, ask=TRUE, ylim=c(1,600))
surf=c(571, 2240)
plotBayesianMCMCreg_surf(regional_with_hist_known, surf)
## End(Not run)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.