cce.factor: High-Dimensional Cumulative Covariance Estimator by Factor...

View source: R/cce.factor.R

cce.factorR Documentation

High-Dimensional Cumulative Covariance Estimator by Factor Modeling and Regularization

Description

This function estimates the covariance and precision matrices of a high-dimesnional Ito process by factor modeling and regularization when it is observed at discrete times possibly nonsynchronously with noise.

Usage

cce.factor(yuima, method = "HY", factor = NULL, PCA = FALSE, 
           nfactor = "interactive", regularize = "glasso", taper, 
           group = 1:(dim(yuima) - length(factor)), lambda = "bic", 
           weight = TRUE, nlambda = 10, ratio, N, thr.type = "soft", 
           thr = NULL, tau = NULL, par.alasso = 1, par.scad = 3.7, 
           thr.delta = 0.01, frequency = 300, utime, ...)

Arguments

yuima

an object of yuima-class or yuima.data-class.

method

the method to be used in cce.

factor

an integer or character vector indicating which components of yuima are factors. If NULL, no factor structure is taken account of.

PCA

logical. If TRUE, a principal component analysis is performed to construct factors.

nfactor

the number of factors constructed when PCA is TRUE. If nfactor = "interactive", the scree plot of the principal component analysis is depicted and the user can set this argument interactively.

regularize

the regularizaton method to be used. Possible choices are "glasso" (the default), "tapering", "thresholding" and "eigen.cleaning". See ‘Details’.

taper

the tapering matrix used when regularize = "tapering". If missing, the tapering matrix is constructed according to group. See ‘Details’.

group

an integer vector having the length equal to dim(yuima)-length(factor).

lambda

the penalty parameter used when regularize = "glasso". If it is "aic" (resp. "bic"), it is selected by minimizing the formally defined AIC (resp. BIC). See ‘Details’.

weight

logical. If TRUE, a weighted version is used for regularize = "glasso" as in Koike (2020).

nlambda

a positive integer indicating the number of candidate penalty parameters for which AIC or BIC is evaluated when lambda is "aic" or "bic".

ratio

a positive number indicating the ratio of the largest and smallest values in candidate penalty parameters for which AIC or BIC is evaluated when lambda is "aic" or "bic". See ‘Details’. The default value is sqrt(log(d)/N), where d is the dimension of yuima.

N

a positive integer indicating the "effective" sampling size, which is necessary to evealuate AIC and BIC when lambda is "aic" or "bic". In a standard situation, it is equal to the sample size - 1, but it might be different when the data are observed nonsynchronously and/or with noise. If missing, it is automatically determined according to method.

thr.type

a character string indicating the type of the thresholding method used when regularize = "thresholding". Possible choices are "hard", "soft", "alasso" and "scad". See Section 2.3 of Dai et al. (2019) for the definition of each method.

thr

a numeric matrix indicating the threshold levels used when regularize = "thresholding". Its entries indicate the threshold levels for the corresponding entries of the covariance matrix (values for λ in the notation of Dai et al. (2019)). A single number is converted to the matrix with common entries equal to that number. If NULL, it is determined according to tau. See ‘Details’.

tau

a number between 0 and 1 used to determine the threshold levels used when regularize = "thresholding" and thr=NULL (a value for τ in the notation of Dai et al. (2019)). If NULL, it is determined by a grid search procedure as suggested in Section 4.3 of Dai et al. (2019). See ‘Details’.

par.alasso

the tuning parameter for thr.type = "alasso" (a value for η in the notation of Dai et al. (2019)).

par.scad

the tuning parameter for thr.type = "scad" (a value for a in the notation of Dai et al. (2019)).

thr.delta

a positive number indicating the step size used in the grid serach procedure to determine tau.

frequency

passed to cce.

utime

passed to cce.

...

passed to cce.

Details

One basic approach to estimate the covariance matrix of high-dimensional time series is to take account of the factor structure and perform regularization for the residual covariance matrix. This function implements such an estimation procedure for high-frequency data modeled as a discretely observed semimartingale. Specifically, let Y be a d-dimensional semimartingale which describes the dynamics of the observation data. We consider the following continuous-time factor model:

Y_t = β X_t + Z_t, 0≤ t≤ T,

where X is an r-dimensional semimartingale (the factor process), Z is a d-dimensional semimartingale (the residual process), and β is a constant d\times r matrix (the factor loading matrix). We assume that X and Z are orthogonal in the sense that [X,Z]_T=0. Then, the quadratic covariation matrix of Y is given by

[Y,Y]_T=β[X,X]_Tβ^\top+[Z,Z]_T.

Also, β can be written as β=[Y,X]_T[X,X]_T^{-1}. Thus, if we have observation data both for Y and X, we can construct estimators for [Y,Y]_T, [X,X]_T and β by cce. Moreover, plugging these estimators into the above equation, we can also construct an estimator for [Z,Z]_T. Since this estimator is often poor due to the high-dimensionality, we regularize it by some method. Then, by plugging the regularized estimator for [Z,Z]_T into the above equation, we obtain the final estimator for [Y,Y]_T.

Even if we do not have observation data for X, we can (at least formally) construct a pseudo factor process by performing principal component analysis for the initial estimator of [Y,Y]_T. See Ait-Sahalia and Xiu (2017) and Dai et al. (2019) for details.

Currently, the following four options are available for the regularization method applied to the residual covariance matrix estimate:

  1. regularize = "glasso" (the default).

    This performs the glaphical Lasso. When weight=TRUE (the default), a weighted version of the graphical Lasso is performed as in Koike (2020). Otherwise, the standard graphical Lasso is performed as in Brownlees et al. (2018).

    If lambda="aic" (resp.~lambda="bic"), the penalty parameter for the graphical Lasso is selected by minimizing the formally defined AIC (resp.~BIC). The minimization is carried out by grid search, where the grid is determined as in Section 5.1 of Koike (2020).

    The optimization problem in the graphical Lasso is solved by the GLASSOFAST algorithm of Sustik and Calderhead (2012), which is available from the package glassoFast.

  2. regularize = "tapering".

    This performs tapering, i.e. taking the entry-wise product of the residual covariance matrix estimate and a tapering matrix specified by taper. See Section 3.5.1 of Pourahmadi (2011) for an overview of this method.

    If taper is missing, it is constructed according to group as follows: taper is a 0-1 matrix and the (i,j)-th entry is equal to 1 if and only if group[i]==group[j]. Thus, by default it makes the residual covariance matrix diagonal.

  3. regularize = "thresholding".

    This performs thresholding, i.e. entries of the residual covariance matrix are shrunk toward 0 according to a thresholding rule (specified by thr.type) and a threshold level (spencified by thr).

    If thr=NULL, the (i,j)-th entry of thr is given by τ√{\hat{[Z^i,Z^i]}_T\hat{[Z^j,Z^j]}_T}, where \hat{[Z^i,Z^i]}_T (resp. \hat{[Z^j,Z^j]}_T) denotes the i-th (resp. j-th) diagonal entry of the non-regularized estimator for the residual covariance matrix [Z,Z]_T, and τ is a tuning parameter specified by tau.

    When tau=NULL, the value of τ is set to the smallest value in the grid with step size thr.delta such that the regularized estimate of the residual covariance matrix becomes positive definite.

  4. regularize = "eigen.cleaning".

    This performs the eigenvalue cleaning algorithm described in Hautsch et al. (2012).

Value

A list with components:

covmat.y

the estimated covariance matrix

premat.y

the estimated precision matrix

beta.hat

the estimated factor loading matrix

covmat.x

the estimated factor covariance matrix

covmat.z

the estimated residual covariance matrix

premat.z

the estimated residual precision matrix

sigma.z

the estimated residual covariance matrix before regularization

pc

the variances of the principal components (it is NULL if PCA = FALSE)

Author(s)

Yuta Koike with YUIMA project Team

References

Ait-Sahalia, Y. and Xiu, D. (2017). Using principal component analysis to estimate a high dimensional factor model with high-frequency data, Journal of Econometrics, 201, 384–399.

Brownlees, C., Nualart, E. and Sun, Y. (2018). Realized networks, Journal of Applied Econometrics, 33, 986–1006.

Dai, C., Lu, K. and Xiu, D. (2019). Knowing factors or factor loadings, or neither? Evaluating estimators of large covariance matrices with noisy and asynchronous data, Journal of Econometrics, 208, 43–79.

Hautsch, N., Kyj, L. M. and Oomen, R. C. (2012). A blocking and regularization approach to high-dimensional realized covariance estimation, Journal of Applied Econometrics, 27, 625–645.

Koike, Y. (2020). De-biased graphical Lasso for high-frequency data, Entropy, 22, 456.

Pourahmadi, M. (2011). Covariance estimation: The GLM and regularization perspectives. Statistical Science, 26, 369–387.

Sustik, M. A. and Calderhead, B. (2012). GLASSOFAST: An efficient GLASSO implementation, UTCSTechnical Report TR-12-29, The University of Texas at Austin.

See Also

cce, lmm, glassoFast

Examples

## Not run: 
set.seed(123)

## Simulating a factor process (Heston model)
drift <- c("mu*S", "-theta*(V-v)")
diffusion <- matrix(c("sqrt(max(V,0))*S", "gamma*sqrt(max(V,0))*rho",
                       0, "gamma*sqrt(max(V,0))*sqrt(1-rho^2)"),
                    2,2)
mod <- setModel(drift = drift, diffusion = diffusion,
                state.variable = c("S", "V")) 
n <- 2340
samp <- setSampling(n = n)
heston <- setYuima(model = mod, sampling = samp)
param <- list(mu = 0.03, theta = 3, v = 0.09, 
              gamma = 0.3, rho = -0.6) 
result <- simulate(heston, xinit = c(1, 0.1), 
                   true.parameter = param)

zdata <- get.zoo.data(result) # extract the zoo data
X <- log(zdata[[1]]) # log-price process
V <- zdata[[2]] # squared volatility process


## Simulating a residual process (correlated BM)
d <- 100 # dimension
Q <- 0.1 * toeplitz(0.7^(1:d-1)) # residual covariance matrix
dZ <- matrix(rnorm(n*d),n,d) %*% chol(Q)/sqrt(n)
Z <- zoo(apply(dZ, 2, "diffinv"), samp@grid[[1]])


## Constructing observation data
b <- runif(d, 0.25, 2.25) # factor loadings
Y <- X %o% b + Z
yuima <- setData(cbind(X, Y))

# We subsample yuima to construct observation data
yuima <- subsampling(yuima, setSampling(n = 78))


## Estimating the covariance matrix (factor is known)
cmat <- tcrossprod(b) * mean(V[-1]) + Q # true covariance matrix
pmat <- solve(cmat) # true precision matrix

# (1) Regularization method is glasso (the default)
est <- cce.factor(yuima, factor = 1) 
norm(est$covmat.y - cmat, type = "2") 
norm(est$premat.y - pmat, type = "2")

# (2) Regularization method is tapering
est <- cce.factor(yuima, factor = 1, regularize = "tapering") 
norm(est$covmat.y - cmat, type = "2") 
norm(est$premat.y - pmat, type = "2")

# (3) Regularization method is thresholding
est <- cce.factor(yuima, factor = 1, regularize = "thresholding") 
norm(est$covmat.y - cmat, type = "2") 
norm(est$premat.y - pmat, type = "2")

# (4) Regularization method is eigen.cleaning
est <- cce.factor(yuima, factor = 1, regularize = "eigen.cleaning") 
norm(est$covmat.y - cmat, type = "2") 
norm(est$premat.y - pmat, type = "2")


## Estimating the covariance matrix (factor is unknown)
yuima2 <- setData(Y)

# We subsample yuima to construct observation data
yuima2 <- subsampling(yuima2, setSampling(n = 78))

# (A) Ignoring the factor structure (regularize = "glasso")
est <- cce.factor(yuima2) 
norm(est$covmat.y - cmat, type = "2") 
norm(est$premat.y - pmat, type = "2")

# (B) Estimating the factor by PCA (regularize = "glasso")
est <- cce.factor(yuima2, PCA = TRUE, nfactor = 1) # use 1 factor 
norm(est$covmat.y - cmat, type = "2") 
norm(est$premat.y - pmat, type = "2")

# One can interactively select the number of factors
# after implementing PCA (the scree plot is depicted)
# Try: est <- cce.factor(yuima2, PCA = TRUE)

## End(Not run)

yuima documentation built on Nov. 14, 2022, 3:02 p.m.

Related to cce.factor in yuima...