phinp: phinp

View source: R/phinp.R

phinpR Documentation

phinp

Description

Given a q-dimensional random vector \mathbf{X} = (\mathbf{X}_{1},...,\mathbf{X}_{k}) with \mathbf{X}_{i} a d_{i}-dimensional random vector, i.e., q = d_{1} + ... + d_{k}, this function estimates the \Phi-dependence between \mathbf{X}_{1},...,\mathbf{X}_{k} by estimating the joint and marginal copula densities via fully non-parametric copula kernel density estimation.

Usage

phinp(sample, cop = NULL, dim, phi, estimator, bw_method)

Arguments

sample

A sample from a q-dimensional random vector \mathbf{X} (n \times q matrix with observations in rows, variables in columns).

cop

A fitted reference hac object, in case bw_method = 0 (default = NULL).

dim

The vector of dimensions (d_{1},...,d_{k}).

phi

The function \Phi.

estimator

Either "beta" or "trans" for the beta kernel or the Gaussian transformation kernel copula density estimator.

bw_method

A number in \{0,1,2\} specifying the method used for computing optimal local bandwidths.

Details

When \mathbf{X} has copula density c with marginal copula densities c_{i} of \mathbf{X}_{i} for i = 1, \dots, k, the \Phi-dependence between \mathbf{X}_{1}, \dots, \mathbf{X}_{k} equals

\mathcal{D}_{\Phi} \left (\mathbf{X}_{1}, \dots, \mathbf{X}_{k} \right ) = \mathbb{E} \left \{ \frac{\prod_{i = 1}^{k} c_{i}(\mathbf{U}_{i})}{c \left ( \mathbf{U} \right )} \Phi \left (\frac{c(\mathbf{U})}{\prod_{i = 1}^{k}c_{i}(\mathbf{U}_{i})} \right ) \right \},

for a certain continuous, convex function \Phi : (0,\infty) \rightarrow \mathbb{R}, and with \mathbf{U} = (\mathbf{U}_{1}, \dots, \mathbf{U}_{k}) \sim c.

The expectation \mathbb{E} is replaced by the empirical mean using the estimated copula sample \widehat{\mathbf{U}}^{(1)}, \dots, \widehat{\mathbf{U}}^{(n)} with \widehat{\mathbf{U}}^{(\ell)} = (\widehat{\mathbf{U}}_{1}^{(\ell)}, \dots, \widehat{\mathbf{U}}_{k}^{(\ell)}) for \ell = 1, \dots, n, where (recall that \mathbf{X}_{i} = (X_{i1}, \dots, X_{id_{i}}) for i = 1, \dots, k)

\widehat{\mathbf{U}}_{i}^{(\ell)} = \left (\widehat{U}_{i1}^{(\ell)}, \dots, \widehat{U}_{id_{i}}^{(\ell)} \right ) = \left (\widehat{F}_{i1} \left (X_{i1}^{(\ell)} \right ), \dots, \widehat{F}_{id_{i}} \left (X_{id_{i}}^{(\ell)} \right ) \right ).

Hereby, \widehat{F}_{ij}(x_{ij}) = \frac{1}{n+1} \sum_{\ell = 1}^{n} 1 \left (X_{ij}^{(\ell)} \leq x_{ij} \right ) is the (rescaled) empirical cdf of X_{ij} based on a sample X_{ij}^{(1)}, \dots, X_{ij}^{(n)} for i = 1, \dots, k and j = 1, \dots, d_{i}.

The joint copula density c and marginal copula densities c_{i} for i = 1, \dots, k are estimated via fully non-parametric copula kernel density estimation. When estimator = "beta", the beta kernel copula density estimator is used. When estimator = "trans", the Gaussian transformation kernel copula density estimator is used.

Bandwidth selection is done locally by using the function hamse. When bw_method = 0, then the given fitted (e.g., via MLE using mlehac) hac object (hierarchical Archimedean copula) cop is used as reference copula. When bw_method = 1, then a non-parametric (beta or Gaussian transformation) kernel copula density estimator based on the pseudos as pivot is used. This pivot is computed using the big O bandwidth (i.e., n^{-2/(q+4)} in case of the beta estimator, and n^{-1/(q+4)} for the transformation estimator, with q the total dimension). When bw_method = 2, the big O bandwidths are taken.

Value

The estimated \Phi-dependence between \mathbf{X}_{1}, \dots, \mathbf{X}_{k}.

References

De Keyser, S. & Gijbels, I. (2024). Hierarchical variable clustering via copula-based divergence measures between random vectors. International Journal of Approximate Reasoning 165:109090. doi: https://doi.org/10.1016/j.ijar.2023.109090.

See Also

betakernelestimator for the computation of the beta kernel copula density estimator,
transformationestimator for the computation of the Gaussian transformation kernel copula density estimator, hamse for local bandwidth selection for the beta kernel or Gaussian transformation kernel copula density estimator.

Examples


q = 4
dim = c(2,2)

# Sample size
n = 500

# Four dimensional hierarchical Gumbel copula
# with parameters (theta_0,theta_1,theta_2) = (2,3,4)
HAC = gethac(dim,c(2,3,4),type = 1)

# Sample
sample =  suppressWarnings(HAC::rHAC(n,HAC))

# Maximum pseudo-likelihood estimator to be used as reference copula for bw_method = 0
est_cop = mlehac(sample,dim,1,c(2,3,4))

# Estimate mutual information between two random vectors of size 2 in different ways

est_phi_1 = phinp(sample,cop = est_cop,dim = dim,phi = function(t){t * log(t)},
                  estimator = "beta",bw_method = 0)
est_phi_2 = phinp(sample,cop = est_cop,dim = dim,phi = function(t){t * log(t)},
                  estimator = "trans",bw_method = 0)
est_phi_3 = phinp(sample,dim = dim,phi = function(t){t * log(t)},
                  estimator = "beta",bw_method = 1)
est_phi_4 = phinp(sample,dim = dim,phi = function(t){t * log(t)},
                  estimator = "trans",bw_method = 1)
est_phi_5 = phinp(sample,dim = dim,phi = function(t){t * log(t)},
                  estimator = "beta",bw_method = 2)
est_phi_6 = phinp(sample,dim = dim,phi = function(t){t * log(t)},
                  estimator = "trans",bw_method = 2)



VecDep documentation built on April 4, 2025, 5:14 a.m.