View source: R/estimationCKT.kernel.R
CKT.kernel | R Documentation |
Let X_1
and X_2
be two random variables.
The goal of this function is to estimate the conditional Kendall's tau
(a dependence measure) between X_1
and X_2
given Z=z
for a conditioning variable Z
.
Conditional Kendall's tau between X_1
and X_2
given Z=z
is defined as:
P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) > 0 | Z_1 = Z_2 = z)
- P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) < 0 | Z_1 = Z_2 = z),
where (X_{1,1}, X_{1,2}, Z_1)
and (X_{2,1}, X_{2,2}, Z_2)
are two independent and identically distributed copies of (X_1, X_2, Z)
.
For this, a kernel-based estimator is used, as described in
(Derumigny, & Fermanian (2019)).
CKT.kernel(
X1 = NULL,
X2 = NULL,
Z = NULL,
newZ,
h,
kernel.name = "Epa",
methodCV = "Kfolds",
Kfolds = 5,
nPairs = 10 * length(observedX1),
typeEstCKT = "wdm",
progressBar = TRUE,
observedX1 = NULL,
observedX2 = NULL,
observedZ = NULL
)
X1 |
a vector of n observations of the first variable (or a 1-column matrix) |
X2 |
a vector of n observations of the second variable (or a 1-column matrix) |
Z |
a vector of n observations of the conditioning variable, or a matrix with n rows of observations of the conditioning vector |
newZ |
the new data of observations of Z at which the conditional Kendall's tau should be estimated. |
h |
the bandwidth used for kernel smoothing.
If this is a vector, then cross-validation is used following the method
given by argument |
kernel.name |
name of the kernel used for smoothing.
Possible choices are |
methodCV |
method used for the cross-validation.
Possible choices are |
Kfolds |
number of subsamples used,
if |
nPairs |
number of pairs used in the cross-validation criteria,
if |
typeEstCKT |
type of estimation of the conditional Kendall's tau. Possible choices are
|
progressBar |
control the display of progress bars. Possible choices are:
|
observedX1 , observedX2 , observedZ |
old parameter names for |
Choice of the bandwidth h
.
The choice of the bandwidth must be done carefully.
In the univariate case, the default kernel (Epanechnikov kernel) has a support
on [-1,1]
, so for a bandwidth h
, estimation of conditional Kendall's
tau at Z=z
will only use points for which Z_i \in [z \pm h]
.
As usual in nonparametric estimation, h
should not be too small
(to avoid having a too large variance) and should not be large
(to avoid having a too large bias).
We recommend that for each z
for which the conditional Kendall's tau
\tau_{X_1, X_2 | Z=z}
is estimated, the set
\{i: Z_i \in [z \pm h] \}
should contain at least 20 points and not more than 30% of the points of
the whole dataset.
Note that for a consistent estimation, as the sample size n
tends
to the infinity, h
should tend to 0
while the size of the set
\{i: Z_i \in [z \pm h]\}
should also tend to the infinity.
Indeed the conditioning points should be closer and closer to the point of interest z
(small h
) and more and more numerous (h
tending to 0 slowly enough).
In the multivariate case, similar recommendations can be made. Because of the curse of dimensionality, a larger sample will be necessary to reach the same level of precision as in the univariate case.
a list with two components
estimatedCKT
the vector of size NROW(newZ)
containing the values of the estimated conditional Kendall's tau.
finalh
the bandwidth h
that was finally used
for kernel smoothing (either the one specified by the user
or the one chosen by cross-validation if multiple bandwidths were given.)
Derumigny, A., & Fermanian, J. D. (2019). On kernel-based estimation of conditional Kendall’s tau: finite-distance bounds and asymptotic behavior. Dependence Modeling, 7(1), 292-321. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1515/demo-2019-0016")}
CKT.estimate
for other estimators
of conditional Kendall's tau.
CKTmatrix.kernel
for a generalization of this function
when the conditioned vector is of dimension d
instead of dimension 2
here.
See CKT.hCV.l1out
for manual selection of the bandwidth h
by leave-one-out or K-folds cross-validation.
# We simulate from a conditional copula
set.seed(1)
N = 800
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])
newZ = seq(2,10,by = 0.1)
estimatedCKT_kernel <- CKT.kernel(
X1 = X1, X2 = X2, Z = Z,
newZ = newZ, h = 0.1, kernel.name = "Epa")$estimatedCKT
# Comparison between true Kendall's tau (in black)
# and estimated Kendall's tau (in red)
trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2)
plot(newZ, trueConditionalTau , col = "black",
type = "l", ylim = c(-1, 1))
lines(newZ, estimatedCKT_kernel, col = "red")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.