empCopula: The Empirical Copula

View source: R/empCopula.R

empCopulaR Documentation

The Empirical Copula

Description

Computes the empirical copula (according to a provided method) and auxiliary tools.

Usage

empCopula(X, smoothing = c("none", "beta", "checkerboard",
                           "schaake.shuffle"), offset = 0,
          ties.method = c("max", "average", "first", "last", "random", "min"))
C.n(u, X, smoothing = c("none", "beta", "checkerboard"), offset = 0,
    ties.method = c("max", "average", "first", "last", "random", "min"))
dCn(u, U, j.ind = 1:d, b = 1/sqrt(nrow(U)), ...)
F.n(x, X, offset = 0, smoothing = c("none", "beta", "checkerboard"))
Cn(x, w) ## <-- deprecated!  use  C.n(w, x) instead!
toEmpMargins(U, x, ...)

Arguments

X

an (n, d)-matrix of pseudo-observations with d columns (as x or u). Recall that a multivariate random sample can be transformed to pseudo-observations via pobs(). For F.n() and if smoothing != "none", X can also be a general, multivariate sample, in which case the empirical distribution function is computed.

u, w

an (m, d)-matrix with elements in [0,1] whose rows contain the evaluation points of the empirical copula.

U

an (n,d)-matrix of pseudo- (or copula-)observations (elements in [0,1], same number d of columns as u (for dCn())) or x (for toEmpMargins()).

x

an (m, d)-matrix whose rows contain the evaluation points of the empirical distribution< function (if smoothing = "none") or copula (if smoothing != "none").

smoothing

character string specifying the type of smoothing of the empirical distribution function (for F.n()) or the empirical copula (for C.n()). Available are:

"none"

the original empirical distribution function or empirical copula.

"beta"

the empirical beta smoothed distribution function or empirical beta copula.

"checkerboard"

empirical checkerboard construction.

"schaake.shuffle"

in each dimension, n (so nrow(X)-many) sorted standard uniforms are used to construct a smooth sample, from which one draws with replacement as many observations as required; only available for the empirical copula and only for sampling.

ties.method

character string specifying how ranks should be computed if there are ties in any of the coordinate samples of x; passed to pobs.

j.ind

integer vector of indices j between 1 and d indicating the dimensions with respect to which first-order partial derivatives are approximated.

b

numeric giving the bandwidth for approximating first-order partial derivatives.

offset

used in scaling the result which is of the form sum(....)/(n+offset); defaults to zero.

...

additional arguments passed to dCn() or sort() underlying toEmpMargins().

Details

Given pseudo-observations from a distribution with continuous margins and copula C, the empirical copula is the (default) empirical distribution function of these pseudo-observations. It is thus a natural nonparametric estimator of C. The function C.n() computes the empirical copula or two alternative smoothed versions of it: the empirical beta copula or the empirical checkerboard copula; see Eqs. (2.1) and (4.1) in Segers, Sibuya and Tsukahara (2017), and the references therein. empCopula() is the constructor of an object of class empCopula.

The function dCn() approximates first-order partial derivatives of the unknown copula using the empirical copula.

The function F.n() computes the empirical distribution function of a multivariate sample. Note that C.n(u, X, smoothing="none", *) simply calls F.n(u, pobs(X), *) after checking u.

There are several asymptotically equivalent definitions of the empirical copula. C.n(, smoothing = "none") is simply defined as the empirical distribution function computed from the pseudo-observations, that is,

C_n(\bm{u})=\frac{1}{n}\sum_{i=1}^n\mathbf{1}_{\{\hat{\bm{U}}_i\le\bm{u}\}},

where \hat{\bm{U}}_i, i\in\{1,\dots,n\}, denote the pseudo-observations and n the sample size. Internally, C.n(,smoothing = "none") is just a wrapper for F.n() and is expected to be fed with the pseudo-observations.

The approximation for the jth partial derivative of the unknown copula C is implemented as, for example, in Rémillard and Scaillet (2009), and given by

\hat{\dot{C}}_{jn}(\bm{u})=\frac{C_n(u_1,..,u_{j-1},min(u_j+b,1),u_{j+1},..,u_d)-C_n(u_1,..,u_{j-1},max(u_j-b,0),u_{j+1},..,u_d)}{2b},

where b denotes the bandwidth and C_n the empirical copula.

Value

empCopula() is the constructor for objects of class empCopula.

C.n() returns the empirical copula of the pseudo-observations X evaluated at u (or a smoothed version of it).

dCn() returns a vector (length(j.ind) is 1) or a matrix (with number of columns equal to length(j.ind)), containing the approximated first-order partial derivatives of the unknown copula at u with respect to the arguments in j.ind.

F.n() returns the empirical distribution function of X evaluated at x if smoothing = "none", the empirical beta copula evaluated at x if smoothing = "beta" and the empirical checkerboard copula evaluated at x if smoothing = "checkerboard".

toEmpMargins() transforms the copula sample U to the empirical margins based on the sample x.

Note

The first version of our empirical copula implementation, Cn(), had its two arguments reversed compared to C.n(), and is deprecated now. You must swap its arguments to transform to new code.

The use of the two smoothed versions assumes implicitly no ties in the component samples of the data.

References

Rüschendorf, L. (1976). Asymptotic distributions of multivariate rank order statistics, Annals of Statistics 4, 912–923.

Deheuvels, P. (1979). La fonction de dépendance empirique et ses propriétés: un test non paramétrique d'indépendance, Acad. Roy. Belg. Bull. Cl. Sci., 5th Ser. 65, 274–292.

Deheuvels, P. (1981). A non parametric test for independence, Publ. Inst. Statist. Univ. Paris 26, 29–50.

Clark, M., Gangopadhyay, S., Hay, L., Rajagopalan, B. and Wilby, R. (2004). The Schaake Shuffle: A Method for Reconstructing Space-Time Variability in Forecasted Precipitation and Temperature Fields. Journal of Hydrometeorology, pages 243-262.

Rémillard, B. and Scaillet, O. (2009). Testing for equality between two copulas. Journal of Multivariate Analysis, 100(3), pages 377-386.

Segers, J., Sibuya, M. and Tsukahara, H. (2017). The Empirical Beta Copula. Journal of Multivariate Analysis, 155, pages 35–51, https://arxiv.org/abs/1607.04430.

Kiriliouk, A., Segers, J. and Tsukahara, H. (2020). Resampling Procedures with Empirical Beta Copulas. https://arxiv.org/abs/1905.12466.

See Also

pobs() for computing pseudo-observations.

Examples

## Generate data X (from a meta-Gumbel model with N(0,1) margins)
n <- 100
d <- 3
family <- "Gumbel"
theta <- 2
cop <- onacopulaL(family, list(theta=theta, 1:d))
set.seed(1)
X <- qnorm(rCopula(n, cop)) # meta-Gumbel data with N(0,1) margins

## Evaluate empirical copula
u <- matrix(runif(n*d), n, d) # random points were to evaluate the empirical copula
ec <- C.n(u, X = X)

## Compare the empirical copula with the true copula
pc <- pCopula(u, copula = cop)
mean(abs(pc - ec)) # ~= 0.012 -- increase n to decrease this error

## The same for the two smoothed versions
beta <- C.n(u, X, smoothing = "beta")
mean(abs(pc - beta))
check <- C.n(u, X, smoothing = "checkerboard")
mean(abs(pc - check))

## Compare the empirical copula with F.n(pobs())
U <- pobs(X) # pseudo-observations
stopifnot(identical(ec, F.n(u, X = pobs(U)))) # even identical

## Compare the empirical copula based on U at U with the Kendall distribution
## Note: Theoretically, C(U) ~ K, so K(C_n(U, U = U)) should approximately be U(0,1)
plot(ecdf(pK(C.n(U, X), cop = cop@copula, d = d)), asp = 1, xaxs="i", yaxs="i")
segments(0,0, 1,1, col=adjustcolor("blue",1/3), lwd=5, lty = 2)
abline(v=0:1, col="gray70", lty = 2)

## Compare the empirical copula and the true copula on the diagonal
C.n.diag <- function(u) C.n(do.call(cbind, rep(list(u), d)), X = X) # diagonal of C_n
C.diag <- function(u) pCopula(do.call(cbind, rep(list(u), d)), cop) # diagonal of C
curve(C.n.diag, from = 0, to = 1, # empirical copula diagonal
      main = paste("True vs empirical diagonal of a", family, "copula"),
      xlab = "u", ylab = quote("True C(u,..,u) and empirical"~C[n](u,..,u)))
curve(C.diag, lty = 2, add = TRUE) # add true copula diagonal
legend("bottomright", lty = 2:1, bty = "n", inset = 0.02,
       legend = expression(C, C[n]))

## Approximate partial derivatives w.r.t. the 2nd and 3rd component
j.ind <- 2:3 # indices w.r.t. which the partial derivatives are computed
## Partial derivatives based on the empirical copula and the true copula
der23 <- dCn(u, U = pobs(U), j.ind = j.ind)
der23. <- copula:::dCdu(archmCopula(family, param=theta, dim=d), u=u)[,j.ind]
## Approximation error
summary(as.vector(abs(der23-der23.)))

## For an example of using F.n(), see help(mvdc)% ./Mvdc.Rd

## Generate a bivariate empirical copula object (various smoothing methods)
n <- 10 # sample size
d <- 2 # dimension
set.seed(271)
X <- rCopula(n, copula = claytonCopula(3, dim = d))
ecop.orig  <- empCopula(X) # smoothing = "none"
ecop.beta  <- empCopula(X, smoothing = "beta")
ecop.check <- empCopula(X, smoothing = "checkerboard")

## Sample from these (smoothed) empirical copulas
m <- 50
U.orig  <-  rCopula(m, copula = ecop.orig)
U.beta  <-  rCopula(m, copula = ecop.beta)
U.check <-  rCopula(m, copula = ecop.check)

## Plot
wireframe2(ecop.orig,  FUN = pCopula, draw.4.pCoplines = FALSE)
wireframe2(ecop.beta,  FUN = pCopula)
wireframe2(ecop.check, FUN = pCopula)
## Density (only exists when smoothing = "beta")
wireframe2(ecop.beta,  FUN = dCopula)

## Transform a copula sample to empirical margins
set.seed(271)
X <- qexp(rCopula(1000, copula = claytonCopula(2))) # multivariate distribution
U <- rCopula(917, copula = gumbelCopula(2)) # new copula sample
X. <- toEmpMargins(U, x = X) # tranform U to the empirical margins of X
plot(X.) # Gumbel sample with empirical margins of X

copula documentation built on Sept. 11, 2024, 7:48 p.m.