phinp: phinp
In VecDep: Measuring Copula-Based Dependence Between Random Vectors

phinp

R Documentation

phinp

Description

Given a q-dimensional random vector \mathbf{X} = (\mathbf{X}_{1},...,\mathbf{X}_{k}) with \mathbf{X}_{i} a d_{i}-dimensional random vector, i.e., q = d_{1} + ... + d_{k}, this function estimates the \Phi-dependence between \mathbf{X}_{1},...,\mathbf{X}_{k} by estimating the joint and marginal copula densities via fully non-parametric copula kernel density estimation.

Usage

phinp(sample, cop = NULL, dim, phi, estimator, bw_method)

Arguments

`sample`	A sample from a `q`-dimensional random vector `\mathbf{X}` (`n \times q` matrix with observations in rows, variables in columns).
`cop`	A fitted reference hac object, in case bw_method = 0 (default = NULL).
`dim`	The vector of dimensions `(d_{1},...,d_{k})`.
`phi`	The function `\Phi`.
`estimator`	Either "beta" or "trans" for the beta kernel or the Gaussian transformation kernel copula density estimator.
`bw_method`	A number in `\{0,1,2\}` specifying the method used for computing optimal local bandwidths.

Details

When \mathbf{X} has copula density c with marginal copula densities c_{i} of \mathbf{X}_{i} for i = 1, \dots, k, the \Phi-dependence between \mathbf{X}_{1}, \dots, \mathbf{X}_{k} equals

\mathcal{D}_{\Phi} \left (\mathbf{X}_{1}, \dots, \mathbf{X}_{k} \right ) = \mathbb{E} \left \{ \frac{\prod_{i = 1}^{k} c_{i}(\mathbf{U}_{i})}{c \left ( \mathbf{U} \right )} \Phi \left (\frac{c(\mathbf{U})}{\prod_{i = 1}^{k}c_{i}(\mathbf{U}_{i})} \right ) \right \},

for a certain continuous, convex function \Phi : (0,\infty) \rightarrow \mathbb{R}, and with \mathbf{U} = (\mathbf{U}_{1}, \dots, \mathbf{U}_{k}) \sim c.

The expectation \mathbb{E} is replaced by the empirical mean using the estimated copula sample \widehat{\mathbf{U}}^{(1)}, \dots, \widehat{\mathbf{U}}^{(n)} with \widehat{\mathbf{U}}^{(\ell)} = (\widehat{\mathbf{U}}_{1}^{(\ell)}, \dots, \widehat{\mathbf{U}}_{k}^{(\ell)}) for \ell = 1, \dots, n, where (recall that \mathbf{X}_{i} = (X_{i1}, \dots, X_{id_{i}}) for i = 1, \dots, k)

\widehat{\mathbf{U}}_{i}^{(\ell)} = \left (\widehat{U}_{i1}^{(\ell)}, \dots, \widehat{U}_{id_{i}}^{(\ell)} \right ) = \left (\widehat{F}_{i1} \left (X_{i1}^{(\ell)} \right ), \dots, \widehat{F}_{id_{i}} \left (X_{id_{i}}^{(\ell)} \right ) \right ).

Hereby, \widehat{F}_{ij}(x_{ij}) = \frac{1}{n+1} \sum_{\ell = 1}^{n} 1 \left (X_{ij}^{(\ell)} \leq x_{ij} \right ) is the (rescaled) empirical cdf of X_{ij} based on a sample X_{ij}^{(1)}, \dots, X_{ij}^{(n)} for i = 1, \dots, k and j = 1, \dots, d_{i}.

The joint copula density c and marginal copula densities c_{i} for i = 1, \dots, k are estimated via fully non-parametric copula kernel density estimation. When estimator = "beta", the beta kernel copula density estimator is used. When estimator = "trans", the Gaussian transformation kernel copula density estimator is used.

Bandwidth selection is done locally by using the function hamse. When bw_method = 0, then the given fitted (e.g., via MLE using mlehac) hac object (hierarchical Archimedean copula) cop is used as reference copula. When bw_method = 1, then a non-parametric (beta or Gaussian transformation) kernel copula density estimator based on the pseudos as pivot is used. This pivot is computed using the big O bandwidth (i.e., n^{-2/(q+4)} in case of the beta estimator, and n^{-1/(q+4)} for the transformation estimator, with q the total dimension). When bw_method = 2, the big O bandwidths are taken.

Value

The estimated \Phi-dependence between \mathbf{X}_{1}, \dots, \mathbf{X}_{k}.

References

De Keyser, S. & Gijbels, I. (2024). Hierarchical variable clustering via copula-based divergence measures between random vectors. International Journal of Approximate Reasoning 165:109090. doi: https://doi.org/10.1016/j.ijar.2023.109090.

Examples


q = 4
dim = c(2,2)

# Sample size
n = 500

# Four dimensional hierarchical Gumbel copula
# with parameters (theta_0,theta_1,theta_2) = (2,3,4)
HAC = gethac(dim,c(2,3,4),type = 1)

# Sample
sample =  suppressWarnings(HAC::rHAC(n,HAC))

# Maximum pseudo-likelihood estimator to be used as reference copula for bw_method = 0
est_cop = mlehac(sample,dim,1,c(2,3,4))

# Estimate mutual information between two random vectors of size 2 in different ways

est_phi_1 = phinp(sample,cop = est_cop,dim = dim,phi = function(t){t * log(t)},
                  estimator = "beta",bw_method = 0)
est_phi_2 = phinp(sample,cop = est_cop,dim = dim,phi = function(t){t * log(t)},
                  estimator = "trans",bw_method = 0)
est_phi_3 = phinp(sample,dim = dim,phi = function(t){t * log(t)},
                  estimator = "beta",bw_method = 1)
est_phi_4 = phinp(sample,dim = dim,phi = function(t){t * log(t)},
                  estimator = "trans",bw_method = 1)
est_phi_5 = phinp(sample,dim = dim,phi = function(t){t * log(t)},
                  estimator = "beta",bw_method = 2)
est_phi_6 = phinp(sample,dim = dim,phi = function(t){t * log(t)},
                  estimator = "trans",bw_method = 2)

VecDep documentation built on April 4, 2025, 5:14 a.m.