Renouv | R Documentation |
Fit a 'renouvellement' POT model using Over the Threshold data and possibly historical data of two kinds.
Renouv(x,
threshold = NULL,
effDuration = NULL,
distname.y = "exponential",
MAX.data = NULL,
MAX.effDuration = NULL,
OTS.data = NULL,
OTS.effDuration = NULL,
OTS.threshold = NULL,
fixed.par.y = NULL,
start.par.y = NULL,
force.start.H = FALSE,
numDeriv = TRUE,
trans.y = NULL,
jitter.KS = TRUE,
pct.conf = c(95, 70),
rl.prob = NULL,
prob.max = 1.0-1e-04 ,
pred.period = NULL,
suspend.warnings = TRUE,
control = list(maxit = 300, fnscale = -1),
control.H = list(maxit = 300, fnscale = -1),
trace = 0,
plot = TRUE,
label = "",
...)
x |
Can be a numeric vector, an object of the class |
threshold |
Value of the threshold for the OT data. |
effDuration |
Effective duration, i.e. duration of the OT period. |
distname.y |
Name of the distribution for the excesses over the threshold. See Details below. |
MAX.data |
Either a numeric vector or a list of numeric vectors representing
historical data |
MAX.effDuration |
Vector of (effective) durations, one by block MAX data. |
OTS.data |
A numeric vector or a list of numeric vectors representing
supplementary Over Threshold data in blocks. When a vector is
given, there is only one block, and the data contain all the
'historical' levels over the corresponding threshold given in
|
OTS.effDuration |
A numeric vector giving the (effective) durations for the OTS blocks. |
OTS.threshold |
A vector giving the thresholds for the different OTS blocks. The
given values must be greater than or equal to the value of
|
fixed.par.y |
Named list of known (or fixed) parameter values for the
|
start.par.y |
Named list of parameter initial values for the
|
force.start.H |
Logical. When |
numDeriv |
Logical: should the hessian be computed using the |
trans.y |
Transformation of the levels before thresholding (if not
|
jitter.KS |
Logical. When set to |
pct.conf |
Character or numeric vector specifying the percentages for the confidence (bilateral) limits on quantiles. |
rl.prob |
Vector of probabilities for the computation of return levels. These
are used in plots (hence must be dense enough) and appear on output
in the data.frame |
prob.max |
Max value of probability for return level and confidence limits
evaluations. This argument 'shortens' the default |
pred.period |
A vector of "pretty" periods at which return level and probability
will be evaluated and returned in the |
suspend.warnings |
Logical. If |
control |
A named list used in |
control.H |
A named list used in |
trace |
Level of verbosity. Value |
plot |
Draw a return level plot? |
label |
Label to be used in the legend when |
... |
Arguments passed to |
The model is fitted using Maximum Likelihood (ML).
Some distributions listed below and here called "special" are considered in a special manner. For these distributions, it is not necessary to give starting values nor parameter names which are unambiguous.
distribution | parameters |
exponential | rate |
weibull | shape , scale |
GPD | scale , shape |
gpd | scale , shape |
lomax | scale , shape |
maxlo | scale , shape |
log-normal | meanlog ,
sdlog |
gamma | shape , scale |
mixexp2 | prob1 , rate1 , delta
|
Other distributions can be used. Because the probability functions are then used in a "black-box" fashion, these distributions should respect the following formal requirements:
The name for the density, distribution and
quantile functions must obey to the classical
"prefixing convention". Prefixes must be respectively: "d"
,
"p"
, "q"
. This rules applies for distribution of the
stats
package and those of many other packages such
evd
.
The first (main) argument must be vectorisable in all
three functions, i.e. a vector of x
, q
or p
must be accepted by the density, the distribution and the quantile
functions.
The density must have a log
formal
argument. When log
is TRUE
, the log-density is
returned instead of the density.
For such a distribution, it is necessary to give arguments names in
start.par.y
. The arguments list must have exactly the required
number of parameters for the family (e.g. 2
for gamma
).
Some parameters can be fixed (known); then the parameter set will be
the reunion of those appearing in start.par.y
and those in
fixed.par.y
. Anyway, in the present version, at least one
parameter must be unknown for the y
part of the model.
Mathematical requirements exist for a correct use of ML. They are referred to as "regularity conditions" in ML theory. Note that the support of the distribution must be the set of non-negative real numbers.
The estimation procedure differs according to the existence of the different types of data: main sample, MAX and OTS.
When no historical data is given, the whole set of parameters
contains orthogonal subsets: a "point process" part
concerning the process of events, and an "observation" part
concerning the excesses over the threshold. The parameters can in this
case be estimated separately. The rate of the Poisson process is estimated
by the empirical rate, i.e. the number of events divided by the total
duration given in effDuration
. The "Over the Threshold"
parameters are estimated from the excesses computed as x.OT
minus the threshold.
When historical data is given, the two parameter vectors must be
coped with together in maximising the global likelihood. In this case,
we begin the estimation ignoring the historical data and then use the
estimates as starting values for the maximisation of the global
likelihood. In some circumstances, the estimates obtained in the first
stage can not be used with historical data because some of these fall
outside the support of the distribution fitted. This can happen
e.g. with a distname.y = "gpd"
when historical data exceed
threshold
- scale
/shape
for the values of
shape
and scale
computed in the first stage.
From version 2.1-1 on, it is possible to use OTS
and/or
MAX
data with no OT
data by specifying x =
NULL
. Yet at the time this is only possible distname.y
takes
one of the two values: "exp"
, or "gpd"
. The initial
values for the parameter are then obtained by using the
parIni.OTS
, parIni.MAX
functions and
possibly by combining the two resulting initial parameter
vectors. This possibility can be used to fit a model from block
maxima or r
-largest classical data but with more
flexibility since the duration of the blocks may here not be constant.
The returned Renouv
object contains a MAX
element
concerning the distribution of block maxima in the two following
cases.
When distname.y
is "exponential"
or "exp"
,
the distribution of the maximum is Gumbel. The estimated parameters
can be used with the gumbel
function of the evd package.
When distname.y
is "gpd"
, "lomax"
,
"maxlo"
or "GPD"
the distribution of the maximum
is a Generalised Extreme Values distribution. The estimated parameters
can be used with the gev
function of the evd package.
An object with class "Renouv"
. This is mainly a list with the
various results.
est.N |
Estimate(s) for the count |
est.y |
Estimate(s) for the excess |
cov.N, cov.y |
The (co-)variances for the estimates above. |
estimate |
Estimate(s) for the whole set of parameters based on OT data and on historical data if available. |
ks.test |
Kolmogorov-Smirnov goodness-of-fit test. |
ret.lev |
A data frame containing return levels and confidence limits. The corresponding probabilities are either provided by user or taken as default values. |
pred |
A data frame similar to |
MAX |
A list providing the estimated distribution of the maximum of the
marks over a block of unit duration. This list element only exists when
this distribution can be deduced from the fit, which
is the case when |
Other results are available. Use names(result)
to see their
list.
Except in the the special case where distname.y
is
"exponential"
and where no historical data are used, the
inference on quantiles is obtained with the delta method and
using numerical derivatives. Confidence limits are unreliable for
return levels much greater than the observation-historical period.
Due to the presence of estimated parameters, the Kolmogorov-Smirnov test is unreliable when less than 30 observations are available.
With some distributions or in presence of historical data, the estimation can fail due to some problem during the optimisation. Even when the optimisation converges, the determination of the (numerical) hessian can be impossible: This can happen if one or more parameter is too small to compute a finite difference approximation of gradient. For instance the 'rate' parameter of the exponential distribution (= inverse mean) will be small when the mean of the excesses is large.
A possible solution is then to rescale the data e.g. dividing
them by 10 or 100. As a rule of thumb, an acceptable scaling leads to
data (excesses) of a few units to a few hundreds, but an
order of magnitude of thousands or more should be avoided and reduced
by scaling. The rescaling is recommended for the square exponential
distribution (obtained with trans =
"square"
) since the
observations are squared.
Another possible way to solve the problem is to change the
numDeriv
value.
The model only concerns the "Over the Threshold" part of the distribution of the observations. When historical data is used, observations should all be larger than the threshold.
The name of the elements in the returned list is indicative, and is
likely to be changed in future versions. At the time, the effect of
historical data on estimation (when such data exist) can be evaluated
by comparing c(res$est.N, res$est.y)
and res$estimate
where res
is the results list.
Some warnings may indicate that missing values are met during the optimisation process. This is due to the evaluation of the density at tail values. At the time the ML estimates are computed using an unconstrained optimisation, so invalid parameter values can be met during the maximisation or even be returned as (invalid) estimates.
Yves Deville
Miquel J. (1984) Guide pratique d'estimation des probabilités de crues, Eyrolles (coll. EDF DER).
Coles S. (2001) Introduction to Statistical Modelling of Extremes Values, Springer.
Embrechts P., Klüppelberg C. and Mikosch T. (1997) Modelling Extremal Events for Insurance and Finance. Springer.
RLplot
for the return level plot. See
optim
for the tuning of the optimisation. The
RenouvNoEst
can be used to create an object with S3
class "Renouv"
from known parameters.
## Garonne data. Use a "Rendata" object as 'x'. Historical data are used!
fit <- Renouv(x = Garonne, distname = "weibull", trace = 1,
main = "'Garonne' data")
summary(fit)
## generates a warning because of the ties
fit2 <- Renouv(x = Garonne, distname = "GPD",
jitter.KS = FALSE,
threshold = 2800, trace = 1,
main = "'Garonne' data with threshold = 2800 and GPD fit")
## use a numeric vector as 'x'
fit3 <-
Renouv(x = Garonne$OTdata$Flow,
threshold = 2500,
effDuration = 100,
distname = "GPD",
OTS.data = list(numeric(0), c(6800, 7200)),
OTS.effDuration = c(100, 150),
OTS.threshold = c(7000, 6000),
trace = 1,
main = "'Garonne' data with artificial \"OTS\" data")
## Add historical (fictive) data
fit4 <- Renouv(x = Garonne$OTdata$Flow,
threshold = 2500,
effDuration = 100,
distname = "weibull",
fixed.par.y = list(shape = 1.1),
OTS.data = list(numeric(0), c(6800, 7200)),
OTS.effDuration = c(100, 150),
OTS.threshold = c(7000, 6000),
trace = 0,
main = "'Garonne' data with artificial \"OTS\" data")
##============================================================================
## use the 'venice' dataset in a r-largest fit from the 'evd' package
##============================================================================
## transform data: each row is a block
MAX.data <- as.list(as.data.frame(t(venice)))
## remove the NA imposed by the rectangular matrix format
MAX.data <- lapply(MAX.data, function(x) x[!is.na(x)])
MAX.effDuration <- rep(1, length(MAX.data))
## fit a Renouv model with no OT data. The threshold
## must be in the support of the gev distribution
u <- 66
fit.gpd <- Renouv(x = NULL,
MAX.data = MAX.data,
MAX.effDuration = MAX.effDuration,
distname.y = "GPD",
threshold = u,
numDeriv = FALSE,
trace = 0,
plot = FALSE)
## Not run:
require(ismev)
## compare with results from the ismev package
fit.gev <- rlarg.fit(venice)
est.gev <- fit.gev$mle
names(est.gev) <- c("loc", "scale", "shape")
## transform the 'gev' fit into a Ren parameter set.
cov.gev <- fit.gev$cov
rownames(cov.gev) <- colnames(cov.gev) <- c("loc", "scale", "shape")
trans <- gev2Ren(est.gev,
threshold = u,
vcovGev = cov.gev)
est <- cbind(ismev = trans, RenextLab = coef(fit.gpd))
colnames(est) <- c("ismev", "RenextLab")
print(est)
## fill a 3d array with the two gpd covariance matrices
cov2 <- attr(trans, "vcov")[c(1, 3, 4), c(1, 3, 4)]
## covariance
covs <-
array(dim = c(2, 3, 3),
dimnames = list(c("ismev", "RenextLab"),
colnames(fit.gpd$cov), colnames(fit.gpd$cov)))
covs["ismev", , ] <- cov2
covs["RenextLab", , ] <- fit.gpd$cov
print(covs)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.