View source: R/estimationCKT.kernel.R
| CKT.kernel | R Documentation |
Let X_1 and X_2 be two random variables.
The goal of this function is to estimate the conditional Kendall's tau
(a dependence measure) between X_1 and X_2 given Z=z
for a conditioning variable Z.
Conditional Kendall's tau between X_1 and X_2 given Z=z
is defined as:
P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) > 0 | Z_1 = Z_2 = z)
- P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) < 0 | Z_1 = Z_2 = z),
where (X_{1,1}, X_{1,2}, Z_1) and (X_{2,1}, X_{2,2}, Z_2)
are two independent and identically distributed copies of (X_1, X_2, Z).
For this, a kernel-based estimator is used, as described in
(Derumigny, & Fermanian (2019)).
CKT.kernel(
X1 = NULL,
X2 = NULL,
Z = NULL,
newZ,
h,
kernel.name = "Epa",
se = FALSE,
confint = FALSE,
level = 0.95,
methodCV = "Kfolds",
Kfolds = 5,
nPairs = NULL,
typeEstCKT = "wdm",
progressBar = 1,
warnNA = TRUE,
observedX1 = NULL,
observedX2 = NULL,
observedZ = NULL
)
X1 |
a vector of n observations of the first variable (or a 1-column matrix) |
X2 |
a vector of n observations of the second variable (or a 1-column matrix) |
Z |
a vector of n observations of the conditioning variable,
or a matrix with n rows of observations of the conditioning vector
(in the case that several conditioning variables are given; in this case,
each column corresponds to 1 conditioning variable). It can also be a
|
newZ |
the new data of observations of Z at which the conditional
Kendall's tau should be estimated. It must have the same number of column
as |
h |
the bandwidth used for kernel smoothing.
If this is a vector, then cross-validation is used following the method
given by argument |
kernel.name |
name of the kernel used for smoothing.
Possible choices are |
se, confint |
if |
level |
the confidence level for the confidence intervals. By default,
95% confidence intervals are computed, i.e. |
methodCV |
method used for the cross-validation.
Possible choices are |
Kfolds |
number of subsamples used,
if |
nPairs |
number of pairs used in the cross-validation criteria,
if |
typeEstCKT |
type of estimation of the conditional Kendall's tau. Possible choices are
|
progressBar |
control the display of progress bars. Possible choices are:
|
warnNA |
a Boolean to indicate whether warnings should be raised if
|
observedX1, observedX2, observedZ |
old parameter names for |
Choice of the bandwidth h.
The choice of the bandwidth must be done carefully.
In the univariate case, the default kernel (Epanechnikov kernel) has a support
on [-1,1], so for a bandwidth h, estimation of conditional Kendall's
tau at Z=z will only use points for which Z_i \in [z \pm h].
As usual in nonparametric estimation, h should not be too small
(to avoid having a too large variance) and should not be large
(to avoid having a too large bias).
We recommend that for each z for which the conditional Kendall's tau
\tau_{X_1, X_2 | Z=z} is estimated, the set
\{i: Z_i \in [z \pm h] \}
should contain at least 20 points and not more than 30% of the points of
the whole dataset.
Note that for a consistent estimation, as the sample size n tends
to the infinity, h should tend to 0 while the size of the set
\{i: Z_i \in [z \pm h]\} should also tend to the infinity.
Indeed the conditioning points should be closer and closer to the point of
interest z (small h) and more and more numerous
(h tending to 0 slowly enough).
In the multivariate case, similar recommendations can be made. Because of the curse of dimensionality, a larger sample will be necessary to reach the same level of precision as in the univariate case.
an S3 object of class estimated_CKT_kernel with
components including:
estimatedCKT the vector of size NROW(newZ)
containing the values of the estimated conditional Kendall's tau.
finalh the bandwidth h that was finally used
for kernel smoothing (either the one specified by the user
or the one chosen by cross-validation if multiple bandwidths were given.)
resultCV (only in case of cross-validation). This gives the
output of the cross-validation function that is used, i.e. the output of
either CKT.hCV.l1out or CKT.hCV.Kfolds.
se, and confint if requested.
Some methods (se, confint and plot) are available for
such an object, see plot.estimated_CKT_kernel.
Derumigny, A., & Fermanian, J. D. (2019). On kernel-based estimation of conditional Kendall’s tau: finite-distance bounds and asymptotic behavior. Dependence Modeling, 7(1), 292-321. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1515/demo-2019-0016")}
CKT.estimate for other estimators
of conditional Kendall's tau.
CKTmatrix.kernel for a generalization of this function
when the conditioned vector is of dimension d
instead of dimension 2 here.
See CKT.hCV.l1out for manual selection of the bandwidth h
by leave-one-out or K-folds cross-validation.
# We simulate from a conditional copula
set.seed(1)
N = 100
# This is a small example for performance reason.
# For a better example, use:
# N = 800
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])
newZ = seq(2,10,by = 0.1)
estimatedCKT_kernel <- CKT.kernel(
X1 = X1, X2 = X2, Z = Z,
newZ = newZ, h = 0.1, kernel.name = "Epa")$estimatedCKT
# Comparison between true Kendall's tau (in black)
# and estimated Kendall's tau (in red)
trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2)
plot(newZ, trueConditionalTau , col = "black",
type = "l", ylim = c(-1, 1))
lines(newZ, estimatedCKT_kernel, col = "red")
# Multivariate example
N = 100
# This is a small example for performance reason.
# For a better example, use:
# N = 1000
Z1 = rnorm(n = N, mean = 5, sd = 2)
Z2 = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z1 - Z2, mean = 2, sd = 2)
simCopula = VineCopula::BiCopSim(N = N , family = 1,
par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])
Z = cbind(Z1, Z2)
newZ = expand.grid(Z1 = seq(2,8,by = 0.5),
Z2 = seq(2,8,by = 1))
estimatedCKT_kernel <- CKT.kernel(
X1 = X1, X2 = X2, Z = Z,
newZ = newZ, h = 1, kernel.name = "Epa")$estimatedCKT
if (requireNamespace("ggplot2", quietly = TRUE)) {
df = rbind(
data.frame(newZ, CKT = estimatedCKT_kernel,
type = "estimated CKT") ,
data.frame(newZ, CKT = -0.9 + 1.8 * pnorm(newZ$Z1 - newZ$Z2,
mean = 2, sd = 2),
type = "true CKT")
)
ggplot2::ggplot(df) +
ggplot2::geom_tile(ggplot2::aes(x = Z1, y = Z2, fill = CKT)) +
ggplot2::facet_grid(as.formula("~type"))
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.