spectralmeas | R Documentation |
Kiriliouk et al. (2016, pp. 360–364) describe estimation of the spectral measure of bivariate data. Standardize the bivariate data as X^\star
and Y^\star
as in psepolar
and select a “large” value for the pseudo-polar radius S_f
for nonexceedance probability f
. Estimate the spectral measure H(w)
, which is the limiting distribution of the pseudo-polar angle component W
given that the corresponding radial component S
is large:
\mathrm{Pr}[W \in \cdot | S > S_f] \rightarrow H(w) \mbox{\ as\ } S_f \rightarrow \infty\mbox{.}
So, H(w)
is the cumulative distribution function of the spectral measure for angle w \in (0,1)
. The S_f
can be specified by a nonexceedance probability F
for S_f(F)
.
The estimation proceeds as follows:
Step 1: Convert the bivariate data (X_i, Y_i)
into (\widehat{S}_i, \widehat{W}_i)
by psepolar
and set the threshold S_f
according to “n/k
” (this part involving k
does not make sense in Kiriliouk et al. (2016)) where for present implementation in copBasic the S_f
given the f
by the user is based on the empirical distribution of the \widehat{S}_i
. The empirical distribution is estimated by the Bernstein empirical distribution function from the lmomco package.
Step 2: Let I_n
denote the set of indices that correspond to the observations when \widehat{S}_i \ge S_f
and compute N_n
as the cardinality of N_n = |I_n|
, which simply means the length of the vector I_n
.
Step 3: Use the maximum Euclidean likelihood estimator, which is the third of three methods mentioned by Kiriliouk et al. (2016):
\widehat{H}_3(w) = \sum_{i \in I_n} \hat{p}_{3,i} \times \mathbf{1}[\widehat{W}_i \le w ]\mbox{,}
where \mathbf{1}[\cdot]
is an indicator function that is only triggered if smooth=FALSE
, and following the notation of Kiriliouk et al. (2016), the “3” represents maximum Euclidean likelihood estimation. The \hat{p}_{3,i}
are are the weights
\hat{p}_{3,i} = \frac{1}{N_n}\bigl[ 1 - (\overline{W} - 1/2)S^{-2}_W(\widehat{W}_i - \overline{W}) \bigr]\mbox{,}
where \overline{W}
is the sample mean and S^2_W
is the sample variance of \widehat{W}_i
\overline{W} = \frac{1}{N_n} \sum_{i \in I_n} \widehat{W}_i\mbox{\quad and\quad } S^2_W = \frac{1}{N_n - 1} \sum_{i \in I_n} (\widehat{W}_i - \overline{W})^2\mbox{,}
where Kiriliouk et al. (2016, p. 363) do not show the N_n - 1
in the denominator for the variance but copBasic uses it because those authors state that the sample variance is used.
Step 4: A smoothed version of \hat{H}_3(w)
is optionally available by
\tilde{H}_3(w) = \sum_{i \in I_n} \hat{p}_{3,i} \times \mathcal{B}(w;\, \widehat{W}_i\nu,\, (1-\widehat{W}_i)\nu)\mbox{,}
where \mathcal{B}(x; p, q)
is the cumulative distribution function of the Beta distribution for p, q > 0
and where \nu > 0
is a smoothing parameter that can be optimized by cross validation.
Step 5: The spectral density lastly can be computed optionally as
\tilde{h}_3(w) = \sum_{i \in I_n} \hat{p}_{3,i} \times \beta(w;\, \widehat{W}_i\nu,\, (1-\widehat{W}_i)\nu)
where \beta(x; p, q)
is the probability density function (pdf) of the Beta distribution. Readers are alerted to the absence of the \mathbf{1}[\cdot]
indicator function in the definitions of \tilde{H}_3(w)
and \tilde{h}_3(w)
. This is correct and matches Kiriliouk et al. (2016, eqs. 17.21 and 17.22) though this author was confused for a day or so by the indicator function in what is purported to be the core definition of \hat{H}_l(w)
where l = 3
in Kiriliouk et al. (2016, eq. 17.21 and 17.17).
spectralmeas(u, v=NULL, w=NULL, f=0.90, snv=FALSE,
smooth=FALSE, nu=100, pdf=FALSE, ...)
u |
Nonexceedance probability |
v |
Nonexceedance probability |
w |
A vector of polar angle values |
f |
The nonexceedance probability |
snv |
Return the standard normal variate of the |
smooth |
A logical to return |
nu |
The |
pdf |
A logical to return the smoothed probability density |
... |
Additional arguments to pass to the |
An R vector
of H_3(w)
, \tilde{H}_3(w)
, or \tilde{h}_3(w)
is returned.
The purpose of this section is to describe a CPU-intensive study of goodness-of-fit between a Gumbel–Hougaard copula (GHcop
, \mathbf{GH}(u,v; \Theta_1)
) parent and a fitted Hüsler–Reiss copula (HRcop
, \mathbf{HR}(u,v; \Theta_2)
). Both of these copulas are extreme values and are somewhat similar to each other, so sample sizes necessary for detection of differences should be large. A two-sided Kolmogorov–Smirnov tests (KS test, ks.test()
) is used to measure significance in the differences between the estimated spectral measure distributions at f=0.90
(the 90th percentile, F(S_f)
) into the right tail.
The true copula will be the \mathbf{GH}(\Theta_1)
having parameter \Theta_1 = 3.3
. The number of simulations per sample size n
\in
seq(50,1000, by=25)
is nsim = 500
. For each sample size, a sample from the true parent is drawn, and a \mathbf{HR}(\Theta_2)
fit by maximum likelihood (mleCOP
). The two spectral measure distributions (\widehat{H}_{\mathbf{GH}}(w)
, Htru
and \widehat{H}_{\mathbf{HR}}(w)
, Hfit
) are estimated for a uniform variate of the angle W
having length equal to the applicable sample size. The Kolmogorov–Smirnov (KS) test is made between Htru
and Hfit
, and number of p-values less than the \beta = 0.05
(Type II error because alternative hypothesis is rigged as true) and simulation count are returned and written in file Results.txt
. The sample sizes initially are small and traps of NaN
(abandonment of a simulation run) are made. These traps are needed when the empirical distribution of Htru
or Hfit
degenerates.
Results <- NULL true.par <- 3.3; true.cop <- GHcop; fit.cop <- HRcop; search <- c(0,100) nsim <- 20000; first_time <- TRUE; f <- 0.90; beta <- 0.05 ns <- c(seq(100,1000, by=50), 1250, 1500, 1750, 2000) for(n in ns) { W <- sort(runif(n)); PV <- vector(mode="numeric") for(i in 1:(nsim/(n/2))) { UV <- simCOP(n=n, cop=true.cop, para=true.par, graphics=FALSE) fit.par <- mleCOP(UV, cop= fit.cop, interval=search)$para UVfit <- simCOP(n=n, cop= fit.cop, para=fit.par, graphics=FALSE) Htru <- spectralmeas(UV, w=W, bound.type="Carv", f=f) Hfit <- spectralmeas(UVfit, w=W, bound.type="Carv", f=f) if(length(Htru[! is.nan(Htru)]) != length(Hfit[! is.nan(Hfit)]) | length(Htru[! is.nan(Htru)]) == 0 | length(Hfit[! is.nan(Hfit)]) == 0) { PV[i] <- NA; next } # suppressWarnings() to silence ties warnings from ks.test() KS <- suppressWarnings( stats::ks.test(Htru, Hfit)$p.value ) #plot(FF, H, type="l"); lines(FF, Hfit, col=2); mtext(KS) message("-",i, appendLF=FALSE) PV[i] <- KS } message(":",n) zz <- data.frame(SampleSize=n, NumPVle0p05=sum(PV[! is.na(PV)] <= beta), SimulationCount=length(PV[! is.na(PV)])) if(first_time) { Results <- zz; first_time <- FALSE; next } Results <- rbind(Results, zz) } plot(Results$SampleSize, 100*Results$NumPVle0p05/Results$SimulationCount, type="b", cex=1.1, xlab="Sample size", ylab="Percent simulations with p-value < 0.05")
The Results
show a general increase in the counts of p-value \le
0.05 as sample size increases. There is variation of course and increasing nsim
would smooth that considerably. The results show for n \approx 1{,}000
that the detection of statistically significant differences for extremal F(S_f) = 0.90
dependency between the \mathbf{GH}(\Theta_1{=}3.3)
and \mathbf{HR}(\Theta_2)
are detected at the error rate implied by the specified \beta = 0.05
.
This range in sample size can be compared to the Kullback–Leibler sample size (n_{fg}
):
UV <- simCOP(n=10000, cop=true.cop, para=true.par, graphics=FALSE) fit.par <- mleCOP(UV, cop= fit.cop, interval=search)$para kullCOP(cop1=true.cop, para1=true.par, cop2=fit.cop, para2=fit.par)$KL.sample.size # The Kullback-Leibler (integer) sample size for detection of differences at # alpha=0.05 are n_fg = (742, 809, 815, 826, 915, 973, 1203) for seven runs # Do more to see variation.
where the Kullback–Leilber approach is to measure density departures across the whole \mathcal{I}^2
domain as opposed to extremal dependency in the right tail as does the spectral measure.
Different runs of the above code will result in different n_{fg}
in part because of simulation differences internal to kullCOP
but also because the \Theta_2
has its own slight variation in its fit to the large sample simulation (n=10{,}000
) of the parent. However, it seems that n_{fg} \approx 900
will be on the order of the n
for which the KS test on the spectral measure determines statistical significance with similar error rate.
Now if the aforementioned simulation run is repeated for F(S_f) = 0.95
or f=0.95
, the n_{fg}
obviously remains unchanged at about 900
but the n
for which the error rate is about \beta = 0.05
is n \approx 600
. This sample size is clearly smaller than before and smaller than n_{fg}
, therefore, the analysis of the empirical spectral measure deeper into the tail F(S_f) = 0.95
requires a smaller sample size to distinguish between the two copula. Though the analysis does not address the question as to whether one or both copula are adequate for the problem at hand. For a final comparison, if the aforementioned simulation run is repeated for F(S_f) = 0.80
or f=0.80
, then the n
for which the error rate is about \beta = 0.05
is n \approx 1{,}700
. Thus as analysis is made further away from the tail into the center of the distribution, the sample size to distinguish between these two similar copula increases substantially.
William Asquith william.asquith@ttu.edu
Kiriliouk, Anna, Segers, Johan, Warchoł, Michał, 2016, Nonparameteric estimation of extremal dependence: in Extreme Value Modeling and Risk Analysis, D.K. Dey and Jun Yan eds., Boca Raton, FL, CRC Press, ISBN 978–1–4987–0129–7.
psepolar
, stabtaildepf
## Not run:
UV <- simCOP(n=500, cop=HRcop, para=1.3, graphics=FALSE)
W <- seq(0,1,by=0.005)
Hu <- spectralmeas(UV, w=W)
Hs <- spectralmeas(UV, w=W, smooth=TRUE, nu=100)
plot(W,Hu, type="l", ylab="Spectral Measure H", xlab="Angle")
lines(W, Hs, col=2) #
## End(Not run)
## Not run:
"GAUScop" <- function(u,v, para=NULL, ...) {
if(length(u)==1) u<-rep(u,length(v)); if(length(v)==1) v<-rep(v,length(u))
return(copula::pCopula(matrix(c(u,v), ncol=2), para))
}
GAUSparfn <- function(rho) return(copula::normalCopula(rho, dim = 2))
n <- 2000 # The PSP parent has no upper tail dependency
uv <- simCOP(n=n, cop=PSP, para=NULL, graphics=FALSE)
PLpar <- mleCOP(uv, cop=PLcop, interval=c(0,100))$para
PLuv <- simCOP(n=n, cop=PLcop, para=PLpar, graphics=FALSE)
GApar <- mleCOP(uv, cop=GAUScop, parafn=GAUSparfn, interval=c(-1,1))$para
GAuv <- simCOP(n=n, cop=GAUScop, para=GApar, graphics=FALSE)
GLpar <- mleCOP(uv, cop=GLcop, interval=c(0,100))$para
GLuv <- simCOP(n=n, cop=GLcop, para=GLpar, graphics=FALSE)
FF <- c(0.001,seq(0.005,0.995, by=0.005),0.999); qFF <- qnorm(FF)
f <- 0.90 # Seeking beyond the 90th percentile pseudo-polar radius
PSPh <- spectralmeas( uv, w=FF, f=f, smooth=TRUE, snv=TRUE)
PLh <- spectralmeas(PLuv, w=FF, f=f, smooth=TRUE, snv=TRUE)
GAh <- spectralmeas(GAuv, w=FF, f=f, smooth=TRUE, snv=TRUE)
GLh <- spectralmeas(GLuv, w=FF, f=f, smooth=TRUE, snv=TRUE)
plot(qFF, PSPh, type="l", lwd=2, xlim=c(-3,3), ylim=c(-2,2),
xlab="STANDARD NORMAL VARIATE OF PSEUDO-POLAR ANGLE",
ylab="STANDARD NORMAL VARIATE OF SPECTRAL MEASURE PROBABILITY")
lines(qFF, PLh, col=2) # red line is the Plackett copula
lines(qFF, GAh, col=3) # green line is the Gaussian copula
lines(qFF, GLh, col=4) # blue line is the Galambos copula
# Notice the flat spot and less steep nature of the PSP (black line), which is
# indicative of no to even spreading tail dependency. The Plackett and Gaussian
# copulas show no specific steepening near the middle, which remains indicative
# of no tail dependency with the Plackett being less steep because it has a more
# dispersed copula density at the right tail is approached than the Gaussian.
# The Galambos copula has upper tail dependency, which is seen by
# the mass concentration and steepening of the curve on the plot.
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.