knitr::opts_chunk$set( eval = TRUE, echo = TRUE, fig.height = 3*0.65, fig.width = 4, warning = FALSE, message = FALSE, collapse = TRUE, comment = "#>" ) library(LMoFit)
Citation to this package:
"Zaghloul M, Papalexiou S, Elshorbagy A (2020). LMoFit: Advanced L-Moment Fitting of Distributions. R package version 0.1.6."
In the practice of frequency analysis, some probability distributions are abandoned due to the absence of a convenient way of handling them. Burr Type-III (BrIII
), Burr Type-XII (BrXII
), and Generalized Gamma (GG
) distributions are examples of such abandoned distributions. Most past studies involving frequency analysis ignored the unconventional distributions regardless of their remarkable advantages. For example, Zaghloul et al. (2020) reveals the importance of using BrIII
and BrXII
distributions instead of the commonly used Generalized Extreme Value (gev
) distribution for flood frequency analysis across Canada. Also, Papalexiou and Koutsoyiannis (2016) recommended the use of BrXII
and GG
distributions for describing daily precipitation across the globe.
Various conventional and unconventional distribution are covered by LMoFit
, which is the first package that facilitated the use of some unconventional distributions that are:
(1) Burr Type-III (BrIII
)
(2) Burr Type-XII (BrXII
)
(3) Generalized Gamma (GG
)
While the commonly used conventional distributions are masked from the 'lmom
' package by Hosking, J.R.M. (2019). These conventional distributions are:
(4) Normal (nor
)
(5) Log-Normal (ln3
)
(6) Pearson Type-3 (pe3
)
(7) Generalized Normal (gno
)
(8) Generalized Pareto (gpa
)
(9) Generalized Logistic (glo
)
(10) Gamma (gam
)
(11) Generalized Extreme Value (gev
) with different parameterization
For each of the above-mentioned distributions, LMoFit
provides functions that:
LMoFit
follows the method of Hosking, J.R.M. (1990) in estimating sample L-moments. The function is referred to as get_sample_lmom()
and can be implemented as follows. For an embedded sample FLOW_AMAX
of annual maximum streamflow in cfs observed at a gauge in BC, Vancouver, Canada @ Lat: 51°14'36.8¨N, Long:116°54'46.6¨W.
FLOW_AMAX
the sample L-moments and L-moment ratios are estimated as
sample_lmoments <- get_sample_lmom(x = FLOW_AMAX) knitr::kable(sample_lmoments, digits = 2, caption = "Sample L-moments and L-moment ratios")
where
Fitting functions are called by adding a prefix fit_
before a distribution name. e.x. fit_BrIII
, fit_BrXII
, fit_GG
, fit_gev
, etc. An example of estimating the fitted parameters of some randomly generated sample can be implemented as
Assume that the sample L-moments are:
# Fitting of BrIII distribution parameters <- fit_BrIII(sl1 = 436, st2 = 0.144, st3 = 0.103) scale <- parameters$scale shape1 <- parameters$shape1 shape2 <- parameters$shape2
This function results in the fitted parameters that are scale, shape1, and shape2 parameters for BrIII distribution.
knitr::kable(parameters[1:3], digits = 2, caption = "Estimated parameters of BrIII distribution")
A cumulative probability distribution function (CDF) estimates the probability of non-exceedance. CDF functions are called in LMoFit
by adding a prefix 'p' before a distribution name. e.x. pBrIII
, pBrXII
, pGG
, pgev
, etc. An example of implementing such functions is provided for the BrXII
distribution as follows.
Assume that the fitting of BrXII
to a sample results in the following parameters:
The probability of non-exceedance of 500 is estimated as
scale <- 322; shape1 <- 6.22; shape2 <- 0.12 probability <- pBrXII(x = 500, para = c(scale, shape1, shape2)) probability
This function can also be implemented to a vector of quantities of interest such as
probability <- pBrXII(x = c(400, 450, 500, 550, 600, 1000), para = c(scale, shape1, shape2)) probability
The probability density functions (PDF) are called in LMoFit
by adding a prefix 'd' before a distribution name. e.x. dBrIII
, dBrXII
, dGG
, dgev
, etc. An example of implementing such functions is provided for the gev
distribution as follows.
Assume that the fitting of gev
to a sample results in the following parameters:
The probability density of a value of 200 is estimated as
location <- 388; scale <- 99; shape <- -0.11 density <- dgev(x = 200, para = c(location, scale, shape)) density
This function can also be implemented to a vector of quantities of interest such as
density <- dgev(x = c(100, 150, 200, 250, 300), para = c(location, scale, shape)) density
The quantile functions in LMoFit
are called by adding a prefix 'q' before a distribution name. e.x. qBrIII
, qBrXII
, qGG
, qgev
, etc. This is one of the most useful and handy functions in LMoFit
because it enables the user to estimate quantiles corresponding to specific probabilities or directly corresponding to specific return periods.
Assume the estimated BrIII parameters are
scale <- 565.29; shape1 <- 5.75; shape2 <- 0.13
then the quantile that corresponds to a non-exceedance probability of 0.99 is
qua <- qBrIII(u = 0.99, para = c(scale, shape1, shape2)) qua
and the quantile that corresponds to a return period of 100 years is
qua <- qBrIII(RP = 100, para = c(scale, shape1, shape2)) qua
The quantile functions can also be implemented for multiple probabilities or return periods as
qua <- qBrIII(RP = c(5, 10, 25, 50, 100, 200), para = c(scale, shape1, shape2)) qua
A return period function is the same as the CDF but after transforming probabilities to their corresponding return periods. This transformation is a simple process that was previously neglected and left for the user. However, LMoFit
is developed to be handy and user friendly. So, the return period functions are included in LMoFit
and are called by adding a prefix 't' before a distribution name. e.x. tBrIII
, tBrXII
, tGG
, tgev
, etc. An example of implementing such functions is provided for the BrXII
distribution as follows.
Assume that the fitting of BrXII
to a sample results in the following parameters:
The return period of a quantity equal to 800 is estimated as
scale <- 322; shape1 <- 6.22; shape2 <- 0.12 return_period <- tBrXII(x = 800, para = c(scale, shape1, shape2)) return_period
This function can also be implemented to a vector of quantities of interest such as
return_period <- tBrXII(x = c(500, 600, 700, 800), para = c(scale, shape1, shape2)) return_period
Distributions with two shape parameters such as the BrIII distribution are presented on the L-moment ratio diagram as a theoretical L-space. LMoFit
developed the theoretical L-space of the BrIII distribution and embedded its results as a ggplot in the package data. Users can call the L-space plot of the BrIII as lspace_BrIII
and apply any desired adjustments to it as for regular ggplots.
lspace_BrIII
The BrXII Distributions also has two shape parameters and can be presented on the L-moment ratio diagram as a theoretical L-space. LMoFit
developed the theoretical L-space of the BrXII distribution and embedded its results as a ggplot in the package data. Users can call the L-space plot of the BrXII as lspace_BrXII
and apply any desired adjustments to it as for regular ggplots.
lspace_BrXII
The GG Distributions is another case of distributions having two shape parameters so it can also be presented on the L-moment ratio diagram with a theoretical L-space. LMoFit
developed the theoretical L-space of the GG distribution and embedded its results as a ggplot in the package data. Users can call the L-space plot of the GG as lspace_GG
and apply any desired adjustments to it as for regular ggplots.
lspace_GG
There are numerous criteria for comparing the goodness of fitting of probability distributions. A very important preliminary test should be a visual inspection of the L-moment ratio diagrams. The sample, that the distributions are fitted to, is presented on the L-moment ratio diagram as an L-point. If this L-point lies outside the L-space of any two-shape parametric distribution, then the distribution is not valid to describe the sample.
For a single sample such as the sample provided as FLOW_AMAX
, the test can be implemented by calling the function com_sam_lspace()
accounting for 'compare a sample with an L-space'. This function is valid for the two-shape parametric distributions that are BrIII
, BrXII
, and GG
. The L-point of the sample is presented by a red point mark on the diagrams.
com_sam_lspace(sample = FLOW_AMAX, type = "s", Dist = "BrIII")
com_sam_lspace(sample = FLOW_AMAX, type = "s", Dist = "BrXII")
com_sam_lspace(sample = FLOW_AMAX, type = "s", Dist = "GG")
For multiple samples, such as streamflow observed at various sites, the sample of each site is presented by one L-point on the L-moment ratio diagram. Therefore, by implementing the visual test to multiple samples we can identify distributions that have their L-spaces overlaying the greatest portion of L-points and therefore we can select the best regionally valid distribution. com_sam_lspace
is developed to be flexible and user friendly. The user can use the same function with multiple sites by changing the type
condition from type = "s"
to type = "m"
. Ten hypothetical samples are developed at 10 hypothetical sites and embedded in the package data as FLOW_AMAX_MULT
. FLOW_AMAX_MULT
is a dataframe with 10 columns where each column describes the sample at one site.
colnames(FLOW_AMAX_MULT) <- paste0("site.", 1:10) knitr::kable(head(FLOW_AMAX_MULT), caption = "The first few observations of streamflow at 10 sites")
An example of a regional visual test is implemented as
com_sam_lspace(sample = FLOW_AMAX_MULT, type = "m", Dist = "BrIII", shape = 20)
com_sam_lspace(sample = FLOW_AMAX_MULT, type = "m", Dist = "BrXII", shape = 20)
com_sam_lspace(sample = FLOW_AMAX_MULT, type = "m", Dist = "GG", shape = 20)
These diagrams are still in ggplot format and can be further adjusted by the user.
In the last example, the visual test with multiple samples FLOW_AMAX_MULT
reveals that the L-space of BrIII
combines the greatest amount of L-points and therefore BrIII
should be a better candidate compared to BrXII
and GG
distributions.
As the number of multiple points increases, the corresponding number of L-points increases, and it gets harder to make a visual judgment. For that reason, LMoFit
provides an extra function to test the condition of each L-point of the multiple samples and identify it as inside or outside the L-space of interest. All L-points that are overlaid by the L-space are assigned a flag 'lpoint_inside_lspace
', or 'lpoint_outside_lspace
' otherwise.
flags_BrIII <- con_sam_lspace(sample = FLOW_AMAX_MULT, type = "m", Dist = "BrIII") knitr::kable(head(flags_BrIII), caption = "Flags obtained for BrIII's L-space") flags_GG <- con_sam_lspace(sample = FLOW_AMAX_MULT, type = "m", Dist = "GG") knitr::kable(head(flags_GG), caption = "Flags obtained for GG's L-space")
By counting the number of times the flag was lpoint_inside_lspace
we can exactly identify the number of L-points inside each L-space as
counter_BrIII <- nrow(flags_BrIII[flags_BrIII$condition == "lpoint_inside_lspace",]) paste0("the number of L-points inside the L-space of BrIII = ", counter_BrIII) counter_GG <- nrow(flags_GG[flags_GG$condition == "lpoint_inside_lspace",]) paste0("the number of L-points inside the L-space of GG = ", counter_GG)
Accordingly, we should use BrIII
rather than GG
as a regional distribution in the latter example.
Note: con_samlmom_lspace()
is similar to con_sam_lspace()
, but it needs the sample L-moments rather than the sample itself and is only applicable to single samples. The sample L-moments are estimated by using get_sample_lmom()
.
Frequency analysis can be implemented by LMoFit
in three steps:
get_sample_lmom()
'.BrIII
, to the estimated sample L-moments, using 'fit_BrIII()
'.pBrIII()
", quantiles "qBrIII()
", densities "dBrIII()
", or even return periods "tBrIII()
".The three steps are further explained by an example of fitting BrIII
to the embedded sample FLOW_AMAX
. Since BrIII
distribution is to be used, we will check its validity to describe the sample.
con_sam_lspace(sample = FLOW_AMAX, type = "s", Dist = "BrIII")
BrIII
passes the visual test since the L-point of the sample is located inside the L-space of BrIII
distribution. com_sam_lspace()
can also be used for the same purpose.
Next, the sample L-moments can be estimated as follows.
# Step 1 sample <- FLOW_AMAX samlmoms <- get_sample_lmom(x = sample)
After that, the desired distribution can be fitted to the sample L-moments to determine its fitted parameters.
# Step 2 parameters <- fit_BrIII(sl1 = samlmoms$sl1, st2 = samlmoms$st2, st3 = samlmoms$st3)
Once the fitted parameters are obtained, frequency analysis can be conducted in different forms to estimate probabilities, quantiles, densities, and return periods.
# Step 3 quantile <- qBrIII(RP = c(5, 10, 25, 50, 100), para = c(parameters$scale, parameters$shape1, parameters$shape2)) prob <- pBrIII(x = quantile, para = c(parameters$scale, parameters$shape1, parameters$shape2)) dens <- dBrIII(x = quantile, para = c(parameters$scale, parameters$shape1, parameters$shape2)) T_yrs <- tBrIII(x = quantile, para = c(parameters$scale, parameters$shape1, parameters$shape2))
The results of this example are concluded in the table below.
output <- cbind(Q = round(quantile, digits = 0), CDF = round(prob, digits = 4), PDF = round(dens, digits = 5), T_yrs) knitr::kable(output, caption = "Example of fitting BrIII distribution to FLOW_AMAX")
This research is financially supported by the Integrated Modeling Program for Canada (IMPC) project under the umbrella of the Global Water Futures (GWF) program at the University of Saskatchewan, Canada. S.M.P. acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC Discovery Grant: RGPIN ‐2019 ‐06894 ). In addition, M.Z. acknowledges the Department of Civil, Geological, and Environmental Engineering for the financial support of research through the Departmental Devolved Scholarship.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.