regtst: Test statistics for regional frequency analysis
In lmomRFA: Regional Frequency Analysis using L-Moments

regtst

R Documentation

Test statistics for regional frequency analysis

Description

Computes discordancy, heterogeneity and goodness-of-fit measures for regional frequency analysis. These are the statistics D_i, H, and Z^{\rm DIST} defined respectively in sections 3.2.3, 4.3.3, and 5.2.3 of Hosking and Wallis (1997).

Usage

regtst(regdata, nsim=1000)

regtst.s(regdata, nsim=1000)

Arguments

regdata

Object of class regdata containing the input data. It should be a data frame, each of whose rows contains data for one site. The first seven columns should contain respectively the site name, record length and L-moments and L-moment ratios, in the order \ell_1 (mean), t (L-CV), t_3 (L-skewness), t_4 (L-kurtosis), and t_5.

Note that the fourth column should contain values of the L-CV t, not the L-scale \ell_2!

Function regsamlmu, with default settings of its arguments, returns an object of class "regdata".

nsim

Number of simulations to use in the calculation of the heterogeneity and goodness-of-fit measures.

If less than 2, only the discordancy measure will be calculated.

Details

The discordancy measure D_i indicates, for site i, the discordancy between the site's L-moment ratios and the (unweighted) regional average L-moment ratios. Large values might be used as a flag to indicate potential errors in the data at the site. “Large” might be 3 for regions with 15 or more sites, but less (exact values in list element Dcrit) for smaller regions.

Three heterogeneity measures are calculated, each based on a different measure of between-site dispersion of L-moment ratios: [1] weighted standard deviation of L-CVs; [2] average of L-CV/L-skew distances; [3] average of L-skew/L-kurtosis distances. These dispersion measures are the quantities V, V_2, and V_3 defined respectively in equations (4.4), (4.6), and (4.7) of Hosking and Wallis (1997). The heterogeneity measures are calculated from them as in equation (4.5) of Hosking and Wallis (1997). In practice H[1] is probably sufficient. A value greater than (say) 1.0 suggests that further subdivision of the region should be considered as it might improve the accuracy of quantile estimates.

Goodness of fit is evaluated for five candidate distributions: generalized logistic, generalized extreme value, generalized normal (lognormal), Pearson type III (3-parameter gamma), and generalized Pareto. In the output the distributions are referred to by 3-letter abbreviations, respectively glo, gev, gno, pe3, and gpa. If the region is homogeneous and data at different sites are statistically independent, then if one of the distributions is the true distribution for the region its goodness-of-fit measure should have approximately a standard normal distribution. Provided that the region is acceptably close to homogeneous, the fit may be judged acceptable at the 10 per cent significance level if the Z value is less than 1.645 (i.e., qnorm(0.95)) in absolute value.

Calculation of heterogeneity and goodness-of-fit measures involves the sampling variability of L-moment ratios in a homogeneous region whose record lengths and average L-moment ratios match those of the data. The sampling variability is estimated by Monte Carlo simulation using nsim replications of the region. Results will vary between invocations of regtst with different seeds for the random-number generator.

In the homogeneous region used in the simulations, the sites have a kappa distribution, fitted to the regional average L-moment ratios of the data in regdata. The kappa fit may fail if the regional average L-kurtosis is high relative to the regional average L-skewness. In this case a kappa distribution is fitted with shape parameter h constrained to be -1 (i.e., a generalized logistic distribution); this gives the largest possible L-kurtosis value for a kappa distribution with given L-skewness.

regtst and regtst.s are functionally identical. regtst calls a Fortran routine internally and is faster, typically by a factor of 3 or 4. regtst.s is written almost entirely in the S language; it is provided so that users can see how the calculations are done, and can conveniently alter the code for their own purposes if necessary.

Value

An object of class "regtst", which is a list with elements as follows.

`data`	The input data, i.e. data frame `regdata` after coercion to class `"regdata"` if necessary.
`nsim`	Number of simulations, i.e. the argument `nsim`.
`D`	Vector containing the discordancy measures for each site.
`Dcrit`	Vector of length 2 containing critical values of the discordancy measure corresponding to significance levels of 10 and 5 per cent — except that the values never exceed 3 and 4 respectively. See Hosking and Wallis (1997), section 3.2.4.
`rmom`	Vector of length 5 containing the regional weighted average `L`-moment ratios (weights proportional to record lengths).
`rpara`	Vector of length 4 containing the parameters of a kappa distribution fitted to the regional weighted average `L`-moment ratios.
`vobs`	Vector of length 3 containing the observed values of the three measures of between-site dispersion of `L`-moment ratios.
`vbar`	Vector of length 3 containing the mean of the simulated values of the three dispersion measures.
`vsd`	Vector of length 3 containing the standard deviation of the simulated values of the three dispersion measures.
`H`	Vector of length 3 containing the three measures of regional heterogeneity.
`para`	List of length 6 containing the parameters of the five candidate distributions and the Wakeby distribution (3-letter abbreviation `"wak"`) fitted to the regional weighted average `L`-moment ratios.
`t4fit`	Vector of length 5 containing the `L`-kurtosis of the five candidate distributions fitted to the regional weighted average `L`-moment ratios.
`Z`	Vector of length 5 containing the goodness-of-fit measures for each of the five candidate distributions.

Note

Data frame regdata may have only six columns, i.e. the fifth L-moment ratio t_5 may be omitted. In this case the return value will contain missing values for rmom[5] and the elements of para$wak.

Author(s)

J. R. M. Hosking jrmhosking@gmail.com

References

Hosking, J. R. M. (1996). Fortran routines for use with the method of L-moments, Version 3. Research Report RC20525, IBM Research Division, Yorktown Heights, N.Y.

Hosking, J. R. M., and Wallis, J. R. (1997). Regional frequency analysis: an approach based on L-moments. Cambridge University Press.

Examples

# An example from Hosking (1996).  Compare the output with
# the file 'cascades.out' in the LMOMENTS Fortran package at
# https://lib.stat.cmu.edu/general/lmoments (results will not
# be identical, because random-number generators are different).
summary(regtst(Cascades, nsim=500))

# Output from 'regsamlmu' can be fed straight into 'regtst'
regtst(regsamlmu(Maxwind))

lmomRFA documentation built on Oct. 1, 2024, 1:08 a.m.