RVineStructureSelect: Sequential Specification of R- and C-Vine Copula Models

View source: R/RVineStructureSelect.R

RVineStructureSelectR Documentation

Sequential Specification of R- and C-Vine Copula Models

Description

This function fits either an R- or a C-vine copula model to a d-dimensional copula data set. Tree structures are determined and appropriate pair-copula families are selected using BiCopSelect() and estimated sequentially (forward selection of trees).

Usage

RVineStructureSelect(
  data,
  familyset = NA,
  type = 0,
  selectioncrit = "AIC",
  indeptest = FALSE,
  level = 0.05,
  trunclevel = NA,
  progress = FALSE,
  weights = NA,
  treecrit = "tau",
  rotations = TRUE,
  se = FALSE,
  presel = TRUE,
  method = "mle",
  cores = 1
)

Arguments

data

An N x d data matrix (with uniform margins).

familyset

An integer vector of pair-copula families to select from. The vector has to include at least one pair-copula family that allows for positive and one that allows for negative dependence. Not listed copula families might be included to better handle limit cases. If familyset = NA (default), selection among all possible families is performed. Coding of pair-copula families is the same as in BiCop().

type

Type of the vine model to be specified:
0 or "RVine" = R-vine (default)
1 or "CVine" = C-vine
C- and D-vine copula models with pre-specified order can be specified using CDVineCopSelect of the package CDVine. Similarly, R-vine copula models with pre-specified tree structure can be specified using RVineCopSelect().

selectioncrit

Character indicating the criterion for pair-copula selection. Possible choices:selectioncrit = "AIC" (default), "BIC", or "logLik" (see BiCopSelect()).

indeptest

logical; whether a hypothesis test for the independence of u1 and u2 is performed before bivariate copula selection (default: indeptest = FALSE; see BiCopIndTest()). The independence copula is chosen for a (conditional) pair if the null hypothesis of independence cannot be rejected.

level

numeric; significance level of the independence test (default: level = 0.05).

trunclevel

integer; level of truncation.

progress

logical; whether the tree-wise specification progress is printed (default: progress = FALSE).

weights

numeric; weights for each observation (optional).

treecrit

edge weight for Dissman's structure selection algorithm, see Details.

rotations

If TRUE, all rotations of the families in familyset are included.

se

Logical; whether standard errors are estimated (default: se = FALSE).

presel

Logical; whether to exclude families before fitting based on symmetry properties of the data. Makes the selection about 30\ (on average), but may yield slightly worse results in few special cases.

method

indicates the estimation method: either maximum likelihood estimation (method = "mle"; default) or inversion of Kendall's tau (method = "itau"). For method = "itau" only one parameter families and the Student t copula can be used (⁠family = 1,2,3,4,5,6,13,14,16,23,24,26,33,34⁠ or 36). For the t-copula, par2 is found by a crude profile likelihood optimization over the interval (2, 10].

cores

integer; if cores > 1, estimation will be parallelized within each tree (using foreach::foreach()). Note that parallelization causes substantial overhead and may be slower than single-threaded computation when dimension, sample size, or family set are small or method = "itau".

Details

R-vine trees are selected using maximum spanning trees w.r.t. some edge weights. The most commonly used edge weight is the absolute value of the empirical Kendall's tau, say \hat{\tau}_{ij}. Then, the following optimization problem is solved for each tree:

\max \sum_{\mathrm{edges }\; e_{ij} \in \; \mathrm{ in \; spanning \; tree}} |\hat{\tau}_{ij}|,

where a spanning tree is a tree on all nodes. The setting of the first tree selection step is always a complete graph. For subsequent trees, the setting depends on the R-vine construction principles, in particular on the proximity condition.

Some commonly used edge weights are implemented:

"tau" absolute value of empirical Kendall's tau.
"rho" absolute value of empirical Spearman's rho.
"AIC" Akaike information (multiplied by -1).
"BIC" Bayesian information criterion (multiplied by -1).
"cAIC" corrected Akaike information criterion (multiplied by -1).

If the data contain NAs, the edge weights in "tau" and "rho" are multiplied by the square root of the proportion of complete observations. This penalizes pairs where less observations are used.

The criteria "AIC", "BIC", and "cAIC" require estimation and model selection for all possible pairs. This is computationally expensive and much slower than "tau" or "rho". The user can also specify a custom function to calculate the edge weights. The function has to be of type function(u1, u2, weights) ... and must return a numeric value. The weights argument must exist, but does not has to be used. For example, "tau" (without using weights) can be implemented as follows:
⁠function(u1, u2, weights)⁠
abs(cor(u1, u2, method = "kendall", use = "complete.obs"))

The root nodes of C-vine trees are determined similarly by identifying the node with strongest dependencies to all other nodes. That is we take the node with maximum column sum in the empirical Kendall's tau matrix.

Note that a possible way to determine the order of the nodes in the D-vine is to identify a shortest Hamiltonian path in terms of weights 1-|\hat{\tau_{ij}|}. This can be established for example using the package TSP. Example code is shown below.

Value

An RVineMatrix() object with the selected structure (RVM$Matrix) and families (RVM$family) as well as sequentially estimated parameters stored in RVM$par and RVM$par2. The object is augmented by the following information about the fit:

se, se2

standard errors for the parameter estimates; note that these are only approximate since they do not account for the sequential nature of the estimation,

nobs

number of observations,

logLik, pair.logLik

log likelihood (overall and pairwise)

AIC, pair.AIC

Aikaike's Informaton Criterion (overall and pairwise),

BIC, pair.BIC

Bayesian's Informaton Criterion (overall and pairwise),

emptau

matrix of empirical values of Kendall's tau,

p.value.indeptest

matrix of p-values of the independence test.

Note

For a comprehensive summary of the vine copula model, use summary(object); to see all its contents, use str(object).

Author(s)

Jeffrey Dissmann, Eike Brechmann, Ulf Schepsmeier, Thomas Nagler

References

Brechmann, E. C., C. Czado, and K. Aas (2012). Truncated regular vines in high dimensions with applications to financial data. Canadian Journal of Statistics 40 (1), 68-85.

Dissmann, J. F., E. C. Brechmann, C. Czado, and D. Kurowicka (2013). Selecting and estimating regular vine copulae and application to financial returns. Computational Statistics & Data Analysis, 59 (1), 52-69.

See Also

RVineMatrix(), BiCop(), RVineCopSelect(), plot.RVineMatrix(), contour.RVineMatrix()

Examples


# load data set
data(daxreturns)

# select the R-vine structure, families and parameters
# using only the first 4 variables and the first 250 observations
# we allow for the copula families: Gauss, t, Clayton, Gumbel, Frank and Joe
daxreturns <- daxreturns[1:250, 1:4]
RVM <- RVineStructureSelect(daxreturns, c(1:6), progress = TRUE)

## see the object's content or a summary
str(RVM)
summary(RVM)

## inspect the fitted model using plots
## Not run: plot(RVM)  # tree structure
contour(RVM)  # contour plots of all pair-copulas

## estimate a C-vine copula model with only Clayton, Gumbel and Frank copulas
CVM <- RVineStructureSelect(daxreturns, c(3,4,5), "CVine")

## determine the order of the nodes in a D-vine using the package TSP
library(TSP)
d <- dim(daxreturns)[2]
M <- 1 - abs(TauMatrix(daxreturns))
hamilton <- insert_dummy(TSP(M), label = "cut")
sol <- solve_TSP(hamilton, method = "repetitive_nn")
order <- cut_tour(sol, "cut")
DVM <- D2RVine(order, family = rep(0,d*(d-1)/2), par = rep(0, d*(d-1)/2))
RVineCopSelect(daxreturns, c(1:6), DVM$Matrix)


VineCopula documentation built on July 26, 2023, 5:23 p.m.