CKT.kernel: Estimation of conditional Kendall's tau using kernel...

View source: R/estimationCKT.kernel.R

CKT.kernelR Documentation

Estimation of conditional Kendall's tau using kernel smoothing

Description

Let X_1 and X_2 be two random variables. The goal of this function is to estimate the conditional Kendall's tau (a dependence measure) between X_1 and X_2 given Z=z for a conditioning variable Z. Conditional Kendall's tau between X_1 and X_2 given Z=z is defined as:

P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) > 0 | Z_1 = Z_2 = z)

- P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) < 0 | Z_1 = Z_2 = z),

where (X_{1,1}, X_{1,2}, Z_1) and (X_{2,1}, X_{2,2}, Z_2) are two independent and identically distributed copies of (X_1, X_2, Z). For this, a kernel-based estimator is used, as described in (Derumigny, & Fermanian (2019)).

Usage

CKT.kernel(
  X1 = NULL,
  X2 = NULL,
  Z = NULL,
  newZ,
  h,
  kernel.name = "Epa",
  methodCV = "Kfolds",
  Kfolds = 5,
  nPairs = 10 * length(observedX1),
  typeEstCKT = "wdm",
  progressBar = TRUE,
  observedX1 = NULL,
  observedX2 = NULL,
  observedZ = NULL
)

Arguments

X1

a vector of n observations of the first variable (or a 1-column matrix)

X2

a vector of n observations of the second variable (or a 1-column matrix)

Z

a vector of n observations of the conditioning variable, or a matrix with n rows of observations of the conditioning vector

newZ

the new data of observations of Z at which the conditional Kendall's tau should be estimated.

h

the bandwidth used for kernel smoothing. If this is a vector, then cross-validation is used following the method given by argument methodCV to choose the best bandwidth before doing the estimation.

kernel.name

name of the kernel used for smoothing. Possible choices are "Gaussian" (Gaussian kernel) and "Epa" (Epanechnikov kernel).

methodCV

method used for the cross-validation. Possible choices are "leave-one-out" and "Kfolds".

Kfolds

number of subsamples used, if methodCV = "Kfolds".

nPairs

number of pairs used in the cross-validation criteria, if methodCV = "leave-one-out".

typeEstCKT

type of estimation of the conditional Kendall's tau. Possible choices are

  • 1 and 3 produced biased estimators. 2 does not attain the full range [-1,1]. Therefore these 3 choices are not recommended for applications on real data.

  • 4 is an improved version of 1,2,3 that has less bias and attains the full range [-1,1].

  • "wdm" is the default version and produces the same results as 4 when they are no ties in the data.

progressBar

control the display of progress bars. Possible choices are:

  • 0 no progress bar is displayed

  • 1 a general progress bar is displayed

  • 2 and larger values: a general progress bar is displayed, and additionally, a progressbar for each value of h is displayed to show the progress of the computation. This only applies when the bandwidth is chosen by cross-validation (i.e. when h is a vector).

observedX1, observedX2, observedZ

old parameter names for X1, X2, Z. Support for this will be removed at a later version.

Details

Choice of the bandwidth h. The choice of the bandwidth must be done carefully. In the univariate case, the default kernel (Epanechnikov kernel) has a support on [-1,1], so for a bandwidth h, estimation of conditional Kendall's tau at Z=z will only use points for which Z_i \in [z \pm h]. As usual in nonparametric estimation, h should not be too small (to avoid having a too large variance) and should not be large (to avoid having a too large bias).

We recommend that for each z for which the conditional Kendall's tau \tau_{X_1, X_2 | Z=z} is estimated, the set \{i: Z_i \in [z \pm h] \} should contain at least 20 points and not more than 30% of the points of the whole dataset. Note that for a consistent estimation, as the sample size n tends to the infinity, h should tend to 0 while the size of the set \{i: Z_i \in [z \pm h]\} should also tend to the infinity. Indeed the conditioning points should be closer and closer to the point of interest z (small h) and more and more numerous (h tending to 0 slowly enough).

In the multivariate case, similar recommendations can be made. Because of the curse of dimensionality, a larger sample will be necessary to reach the same level of precision as in the univariate case.

Value

a list with two components

  • estimatedCKT the vector of size NROW(newZ) containing the values of the estimated conditional Kendall's tau.

  • finalh the bandwidth h that was finally used for kernel smoothing (either the one specified by the user or the one chosen by cross-validation if multiple bandwidths were given.)

References

Derumigny, A., & Fermanian, J. D. (2019). On kernel-based estimation of conditional Kendall’s tau: finite-distance bounds and asymptotic behavior. Dependence Modeling, 7(1), 292-321. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1515/demo-2019-0016")}

See Also

CKT.estimate for other estimators of conditional Kendall's tau. CKTmatrix.kernel for a generalization of this function when the conditioned vector is of dimension d instead of dimension 2 here.

See CKT.hCV.l1out for manual selection of the bandwidth h by leave-one-out or K-folds cross-validation.

Examples

# We simulate from a conditional copula
set.seed(1)
N = 800
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
    par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])

newZ = seq(2,10,by = 0.1)
estimatedCKT_kernel <- CKT.kernel(
   X1 = X1, X2 = X2, Z = Z,
   newZ = newZ, h = 0.1, kernel.name = "Epa")$estimatedCKT

# Comparison between true Kendall's tau (in black)
# and estimated Kendall's tau (in red)
trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2)
plot(newZ, trueConditionalTau , col = "black",
     type = "l", ylim = c(-1, 1))
lines(newZ, estimatedCKT_kernel, col = "red")


CondCopulas documentation built on Sept. 11, 2024, 9:10 p.m.