disfitqua: Fit a Distribution using Minimization of Available Quantiles

disfitquaR Documentation

Fit a Distribution using Minimization of Available Quantiles

Description

This function fits a distribution to available quantiles (or irregular quantiles) through n-dimensional minimization using the optim function. Objective function forms are either root mean-square error (RMSE) or mean absolute deviation (MAD), and the objective functions are expected to result in slightly different estimates of distribution parameters. The RMSE form (\sigma_{\mathrm{RMSE}}) is defined as

\sigma_{\mathrm{RMSE}} = \biggl[ \frac{1}{m}\,\sum_{i=1}^m \bigl[x_o(f_i) - \hat{x}(f_i)\bigr]^2\biggr]^{1/2}\mbox{,}

where m is the length of the vector of observed quantiles x_o(f_i) for nonexceedance probability f_i for i \in 1, 2, \cdots, m, and \hat{x}(f_i) for i \in 1, 2, \cdots, m are quantile estimates based on the “current” iteration of the parameters for the selected distribution having n parameters for n \le m. Similarly, the MAD form (\sigma_{\mathrm{MAD}}) is defined as

\sigma_{\mathrm{MAD}} = \frac{1}{m}\,\sum_{i=1}^m \mid x_o(f_i) - \hat{x}(f_i) \mid \mbox{.}

The disfitqua function is not intended to be an implementation of the method of percentiles but rather is intended for circumstances in which the available quantiles are restricted to either the left or right tails of the distribution. It is evident that a form of the method of percentiles however could be pursued by disfitqua when the length of x(f) is equal to the number of distribution parameters (n = m). The situation of n < m however is thought to be the most common application.

The right-tail restriction is the general case in flood-peak hydrology in which the median and select quantiles greater than the median can be available from empirical studies (e.g. Asquith and Roussel, 2009) or rainfall-runoff models. The available quantiles suit engineering needs and thus left-tail quantiles simply are not available. This circumstance might appear quite unusual to users from most statistical disciplines but quantile estimates can exist from regional study of observed data. The Examples section provides further motivation and discussion.

Usage

disfitqua(x, f, objfun=c("rmse", "mad"),
                init.lmr=NULL, init.para=NULL, type=NA,
                ptransf=  function(t) return(t),
                pretransf=function(t) return(t), verbose=FALSE, ... )

Arguments

x

The quantiles x_o(f) for the nonexceedance probabilities in f.

f

The nonexceedance probabilities f of the quantiles x_o(f) in x.

objfun

The form of the objective function as previously described.

init.lmr

Optional initial values for the L-moments from which the initial starting parameters for the optimization will be determined. The optimizations by this function are not performed on the L-moments during the optimization. The form of init.lmr is that of an L-moment object from the lmomco package (e.g. lmoms).

init.para

Optional initial values for the parameters used for starting values for the optim function. If this argument is not set nor is init.lmr, then unrigorous estimates of the mean \lambda_1 and L-scale \lambda_2 are made from the available quantiles, higher L-moment ratios \tau_r for r \ge 3 are set to zero, and the L-moments converted to the initial parameters.

type

The distribution type specified by the abbreviations listed under dist.list.

ptransf

An optional parameter transformation function (see Examples) that is useful to guide the optimization run. For example, suppose the first parameter of a three parameter distribution resides in the positive domain, then
ptransf(t) = function(t) c(log(t[1]), t[2], t[3]).

pretransf

An optional parameter retransformation function (see Examples) that is useful to guide the optimization run. For example, suppose the first parameter of a three parameter distribution resides in the positive domain, then
pretransf(t) = function(t) c(exp(t[1]), t[2], t[3]).

verbose

A logical switch on the verbosity of output.

...

Additional arguments to pass to the optim function.

Value

An R list is returned, and this list contains at least the following items:

type

The type of distribution in character format (see dist.list).

para

The parameters of the distribution.

source

Attribute specifying source of the parameters—“disfitqua”.

init.para

A vector of the initial parameters actually passed to the optim function to serve only as a reminder.

disfitqua

The returned list from the optim function. This list contains a repeat of the parameters, the value of the objective function (\sigma_{\mathrm{RMSE}} or \sigma_{\mathrm{MAD}}), the interation count, and convergence status.

Note

The disfitqua function is likely more difficult to apply for n > 3 (high parameter) distributions because of the inherent complexity of the mathematics of such distributions and their applicable parameter (and thus valid L-moment ranges). The complex interplay between parameters and L-moments can make identification of suitable initial parameters init.para or initial L-moments init.lmr more difficult than is the case for n \le 3 distributions. The default initial parameters are computed from an assumed condition that the L-moments ratios \tau_r = 0 for r \ge 3. This is not ideal, however, and the Examples show how to move into high parameter distributions using the results from a previous fit.

Author(s)

W.H. Asquith

References

Asquith, W.H., and Roussel, M.C., 2009, Regression equations for estimation of annual peak-streamflow frequency for undeveloped watersheds in Texas using an L-moment-based, PRESS-minimized, residual-adjusted approach: U.S. Geological Survey Scientific Investigations Report 2009–5087, 48 p., \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3133/sir20095087")}.

See Also

dist.list, lmoms, lmom2vec, par2lmom, par2qua, vec2lmom, vec2par

Examples

# Suppose the following quantiles are estimated using eight equations provided by
# Asquith and Roussel (2009) for some watershed in Texas:
Q <- c(1480, 3230, 4670, 6750, 8700, 11000, 13600, 17500)
# These are real estimates from a suite of watershed properties; the watershed
# itself and location are not germane to demonstrate this function.
LQ <- log10(Q) # transform to logarithms of cubic feet per second
# Convert the averge annual return periods for the quantiles into probability
P <- T2prob(c(2, 5, 10, 25, 50, 100, 200, 500)); qP <- qnorm(P) # std norm variates
# The log-Pearson type III (LPIII) is immensely popular for flood-risk computations.
# Let us compute LPIII parameters to the available quantiles and probabilities for
# the watershed. The log-Pearson type III is "pe3" in the lmomco with logarithms.
par1 <- disfitqua(LQ, P, type="pe3", objfun="rmse") # root mean square error
par2 <- disfitqua(LQ, P, type="pe3", objfun="mad" ) # mean absolute deviation
# Now express the fitted distributions in forms of an LPIII.
LQfit1 <- qlmomco(P, par1); LQfit2 <- qlmomco(P, par2)

plot( qP, LQ, pch=5, xlab="STANDARD NORMAL VARIATES",
                     ylab="FLOOD QUANTILES, CUBIC FEET PER SECOND")
lines(qP, LQfit1, col=2); lines(qP, LQfit2, col=4) # red and blue lines

## Not run: 
# Now demonstrate how a Wakeby distribution can be fit. This is an example of how a
# three parameter distribution might be fit, and then the general L-moments secured for
# an alternative fit using a far more complicated distribution. The Wakeby for the
# above situation does not fit out of the box.
lmr1 <- theoLmoms(par1) # We need five L-moments but lmompe3() only gives four,
# therefore must compute the L-moment by numerical integration provided by theoLmoms().
par3 <- disfitqua(LQ, P, type="wak", objfun="rmse", init.lmr=lmr1)
lines(qP, par2qua(P, par3), col=6, lty=2) # dashed line, par2qua alternative to qlmomco

# Finally, the initial L-moment equivalents and then the L-moments of the fitted
# distribution can be computed and compared.
par2lmom(vec2par(par3$init.para, type="wak"))$ratios # initial L-moments
par2lmom(vec2par(par3$para,      type="wak"))$ratios # final   L-moments
## End(Not run)

wasquith/lmomco documentation built on Nov. 13, 2024, 4:53 p.m.