CKT.kernel: Estimation of conditional Kendall's tau using kernel...
In CondCopulas: Estimation and Inference for Conditional Copula Models

CKT.kernel

R Documentation

Estimation of conditional Kendall's tau using kernel smoothing

Description

Let X_1 and X_2 be two random variables. The goal of this function is to estimate the conditional Kendall's tau (a dependence measure) between X_1 and X_2 given Z=z for a conditioning variable Z. Conditional Kendall's tau between X_1 and X_2 given Z=z is defined as:

P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) > 0 | Z_1 = Z_2 = z)

- P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) < 0 | Z_1 = Z_2 = z),

where (X_{1,1}, X_{1,2}, Z_1) and (X_{2,1}, X_{2,2}, Z_2) are two independent and identically distributed copies of (X_1, X_2, Z). For this, a kernel-based estimator is used, as described in (Derumigny, & Fermanian (2019)).

Usage

CKT.kernel(
  X1 = NULL,
  X2 = NULL,
  Z = NULL,
  newZ,
  h,
  kernel.name = "Epa",
  methodCV = "Kfolds",
  Kfolds = 5,
  nPairs = 10 * length(observedX1),
  typeEstCKT = "wdm",
  progressBar = TRUE,
  observedX1 = NULL,
  observedX2 = NULL,
  observedZ = NULL
)

Arguments

`X1`	a vector of n observations of the first variable (or a 1-column matrix)
`X2`	a vector of n observations of the second variable (or a 1-column matrix)
`Z`	a vector of n observations of the conditioning variable, or a matrix with n rows of observations of the conditioning vector
`newZ`	the new data of observations of Z at which the conditional Kendall's tau should be estimated.
`h`	the bandwidth used for kernel smoothing. If this is a vector, then cross-validation is used following the method given by argument `methodCV` to choose the best bandwidth before doing the estimation.
`kernel.name`	name of the kernel used for smoothing. Possible choices are `"Gaussian"` (Gaussian kernel) and `"Epa"` (Epanechnikov kernel).
`methodCV`	method used for the cross-validation. Possible choices are `"leave-one-out"` and `"Kfolds"`.
`Kfolds`	number of subsamples used, if `methodCV = "Kfolds"`.
`nPairs`	number of pairs used in the cross-validation criteria, if `methodCV = "leave-one-out"`.
`typeEstCKT`	type of estimation of the conditional Kendall's tau. Possible choices are `1` and `3` produced biased estimators. `2` does not attain the full range `[-1,1]`. Therefore these 3 choices are not recommended for applications on real data. `4` is an improved version of `1,2,3` that has less bias and attains the full range `[-1,1]`. `"wdm"` is the default version and produces the same results as `4` when they are no ties in the data.
`progressBar`	control the display of progress bars. Possible choices are: `0` no progress bar is displayed `1` a general progress bar is displayed `2` and larger values: a general progress bar is displayed, and additionally, a progressbar for each value of `h` is displayed to show the progress of the computation. This only applies when the bandwidth is chosen by cross-validation (i.e. when `h` is a vector).
`observedX1`, `observedX2`, `observedZ`	old parameter names for `X1`, `X2`, `Z`. Support for this will be removed at a later version.

Details

Choice of the bandwidth h. The choice of the bandwidth must be done carefully. In the univariate case, the default kernel (Epanechnikov kernel) has a support on [-1,1], so for a bandwidth h, estimation of conditional Kendall's tau at Z=z will only use points for which Z_i \in [z \pm h]. As usual in nonparametric estimation, h should not be too small (to avoid having a too large variance) and should not be large (to avoid having a too large bias).

We recommend that for each z for which the conditional Kendall's tau \tau_{X_1, X_2 | Z=z} is estimated, the set \{i: Z_i \in [z \pm h] \} should contain at least 20 points and not more than 30% of the points of the whole dataset. Note that for a consistent estimation, as the sample size n tends to the infinity, h should tend to 0 while the size of the set \{i: Z_i \in [z \pm h]\} should also tend to the infinity. Indeed the conditioning points should be closer and closer to the point of interest z (small h) and more and more numerous (h tending to 0 slowly enough).

In the multivariate case, similar recommendations can be made. Because of the curse of dimensionality, a larger sample will be necessary to reach the same level of precision as in the univariate case.

Value

a list with two components

estimatedCKT the vector of size NROW(newZ) containing the values of the estimated conditional Kendall's tau.
finalh the bandwidth h that was finally used for kernel smoothing (either the one specified by the user or the one chosen by cross-validation if multiple bandwidths were given.)

References

Derumigny, A., & Fermanian, J. D. (2019). On kernel-based estimation of conditional Kendall’s tau: finite-distance bounds and asymptotic behavior. Dependence Modeling, 7(1), 292-321. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1515/demo-2019-0016")}

Examples

# We simulate from a conditional copula
set.seed(1)
N = 800
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
    par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])

newZ = seq(2,10,by = 0.1)
estimatedCKT_kernel <- CKT.kernel(
   X1 = X1, X2 = X2, Z = Z,
   newZ = newZ, h = 0.1, kernel.name = "Epa")$estimatedCKT

# Comparison between true Kendall's tau (in black)
# and estimated Kendall's tau (in red)
trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2)
plot(newZ, trueConditionalTau , col = "black",
     type = "l", ylim = c(-1, 1))
lines(newZ, estimatedCKT_kernel, col = "red")

CondCopulas documentation built on Sept. 11, 2024, 9:10 p.m.