# CastilloAndHadi1994: Abstract: Castillo and Hadi (1994) In EnvStats: Package for Environmental Statistics, Including US EPA Guidance

## Description

Detailed abstract of the manuscript:

Castillo, E., and A. Hadi. (1994). Parameter and Quantile Estimation for the Generalized Extreme-Value Distribution. Environmetrics 5, 417–432.

## Details

Abstract
Castillo and Hadi (1994) introduce a new way to estimate the parameters and quantiles of the generalized extreme value distribution (GEVD) with parameters location=η, scale=θ, and shape=κ. The estimator is based on a two-stage procedure using order statistics, denoted here by “TSOE”, which stands for two-stage order-statistics estimator. Castillo and Hadi (1994) compare the TSOE to the maximum likelihood estimator (MLE; Jenkinson, 1969; Prescott and Walden, 1983) and probability-weighted moments estimator (PWME; Hosking et al., 1985).

Castillo and Hadi (1994) note that for some samples the likelihood may not have a local maximum, and also when κ > 1 the likelihood can be made infinite so the MLE does not exist. They also note, as do Hosking et al., 1985), that when κ ≤ -1, the moments and probability-weighed moments of the GEVD do not exist, hence neither does the PWME. (Hosking et al., however, claim that in practice the shape parameter usually lies between -1/2 and 1/2.) On the other hand, the TSOE exists for all values of κ.

Based on computer simulations, Castillo and Hadi (1994) found that the performance (bias and root mean squared error) of the TSOE is comparable to the PWME for values of κ in the range -1/2 ≤ κ ≤ 1/2. They also found that the TSOE is superior to the PWME for large values of κ. Their results, however, are based on using the PWME computed using the approximation given in equation (14) of Hosking et al. (1985, p.253). The true PWME is computed using equation (12) of Hosking et al. (1985, p.253). Hosking et al. (1985) introduced the approximation as a matter of computational convenience, and noted that it is valid in the range -1/2 ≤ κ ≤ 1/2. If Castillo and Hadi (1994) had used the true PWME for values of κ larger than 1/2, they probably would have gotten very different results for the PWME. (Note: the function egevd with method="pwme" uses the exact equation (12) of Hosking et al. (1985), not the approximation (14)).

Castillo and Hadi (1994) suggest using the bootstrap or jackknife to obtain variance estimates and confidence intervals for the distribution parameters based on the TSOE.

More Details Let \underline{x} = (x_1, x_2, …, x_n) be a vector of n observations from a generalized extreme value distribution with parameters location=η, scale=θ, and shape=κ with cumulative distribution function F. Also, let x(1), x(2), …, x(n) denote the ordered values of \underline{x}.

First Stage
Castillo and Hadi (1994) propose as initial estimates of the distribution parameters the solutions to the following set of simultaneous equations based on just three observations from the total sample of size n:

F[x(1); η, θ, κ] = p_{1,n}

F[x(j); η, θ, κ] = p_{j,n}

F[x(n); η, θ, κ] = p_{n,n} \;\;\;\; (1)

where 2 ≤ j ≤ n-1, and

p_{i,n} = \hat{F}[x(i); η, θ, κ]

denotes the i'th plotting position for a sample of size n; that is, a nonparametric estimate of the value of F at x(i). Typically, plotting positions have the form:

p_{i,n} = \frac{i-a}{n+b} \;\;\;\; (2)

where b > -a > -1. In their simulation studies, Castillo and Hadi (1994) used a=0.35, b=0.

Since j is arbitrary in the above set of equations (1), denote the solutions to these equations by:

\hat{η}_j, \hat{θ}_j, \hat{κ}_j

There are thus n-2 sets of estimates.

Castillo and Hadi (1994) show that the estimate of the shape parameter, κ, is the solution to the equation:

\frac{x(j) - x(n)}{x(1) - x(n)} = \frac{1 - A_{jn}^κ}{1 - A_{1n}^κ} \;\;\;\; (3)

where

A_{ik} = C_i / C_k \;\;\;\; (4)

C_i = -log(p_{i,n}) \;\;\;\; (5)

Castillo and Hadi (1994) show how to easily solve equation (3) using the method of bisection.

Once the estimate of the shape parameter is obtained, the other estimates are given by:

\hat{θ}_j = \frac{\hat{κ}_j [x(1) - x(n)]}{(C_n)^{\hat{κ}_j} - (C_1)^{\hat{κ}_j}} \;\;\;\; (6)

\hat{η}_j = x(1) - \frac{\hat{θ}_j [1 - (C_1)^{\hat{κ}_j}]}{\hat{κ}_j} \;\;\;\; (7)

Second Stage
Apply a robust function to the n-2 sets of estimates obtained in the first stage. Castillo and Hadi (1994) suggest using either the median or the least median of squares (using a column of 1's as the predictor variable; see the help file for lmsreg in the package MASS). Using the median, for example, the final distribution parameter estimates are given by:

\hat{η} = Median(\hat{η}_2, \hat{η}_3, …, \hat{η}_{n-1})

\hat{θ} = Median(\hat{θ}_2, \hat{θ}_3, …, \hat{θ}_{n-1})

\hat{κ} = Median(\hat{κ}_2, \hat{κ}_3, …, \hat{κ}_{n-1})

## Author(s)

Steven P. Millard ([email protected])

## References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Hosking, J.R.M. (1985). Algorithm AS 215: Maximum-Likelihood Estimation of the Parameters of the Generalized Extreme-Value Distribution. Applied Statistics 34(3), 301–310.

Jenkinson, A.F. (1969). Statistics of Extremes. Technical Note 98, World Meteorological Office, Geneva.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York.

Prescott, P., and A.T. Walden. (1983). Maximum Likelihood Estimation of the Three-Parameter Generalized Extreme-Value Distribution from Censored Samples. Journal of Statistical Computing and Simulation 16, 241–250.

Generalized Extreme Value Distribution, egevd, Hosking et al., 1985).