cce.factor: High-Dimensional Cumulative Covariance Estimator by Factor...
In yuima: The YUIMA Project Package for SDEs

cce.factor

R Documentation

High-Dimensional Cumulative Covariance Estimator by Factor Modeling and Regularization

Description

This function estimates the covariance and precision matrices of a high-dimesnional Ito process by factor modeling and regularization when it is observed at discrete times possibly nonsynchronously with noise.

Usage

cce.factor(yuima, method = "HY", factor = NULL, PCA = FALSE, 
           nfactor = "interactive", regularize = "glasso", taper, 
           group = 1:(dim(yuima) - length(factor)), lambda = "bic", 
           weight = TRUE, nlambda = 10, ratio, N, thr.type = "soft", 
           thr = NULL, tau = NULL, par.alasso = 1, par.scad = 3.7, 
           thr.delta = 0.01, frequency = 300, utime, ...)

Arguments

`yuima`	an object of `yuima-class` or `yuima.data-class`.
`method`	the method to be used in `cce`.
`factor`	an integer or character vector indicating which components of `yuima` are factors. If `NULL`, no factor structure is taken account of.
`PCA`	logical. If `TRUE`, a principal component analysis is performed to construct factors.
`nfactor`	the number of factors constructed when `PCA` is `TRUE`. If `nfactor = "interactive"`, the scree plot of the principal component analysis is depicted and the user can set this argument interactively.
`regularize`	the regularizaton method to be used. Possible choices are `"glasso"` (the default), `"tapering"`, `"thresholding"` and `"eigen.cleaning"`. See ‘Details’.
`taper`	the tapering matrix used when `regularize = "tapering"`. If missing, the tapering matrix is constructed according to `group`. See ‘Details’.
`group`	an integer vector having the length equal to `dim(yuima)-length(factor)`.
`lambda`	the penalty parameter used when `regularize = "glasso"`. If it is `"aic"` (resp. `"bic"`), it is selected by minimizing the formally defined AIC (resp. BIC). See ‘Details’.
`weight`	logical. If `TRUE`, a weighted version is used for `regularize = "glasso"` as in Koike (2020).
`nlambda`	a positive integer indicating the number of candidate penalty parameters for which AIC or BIC is evaluated when `lambda` is `"aic"` or `"bic"`.
`ratio`	a positive number indicating the ratio of the largest and smallest values in candidate penalty parameters for which AIC or BIC is evaluated when `lambda` is `"aic"` or `"bic"`. See ‘Details’. The default value is `sqrt(log(d)/N)`, where `d` is the dimension of `yuima`.
`N`	a positive integer indicating the "effective" sampling size, which is necessary to evealuate AIC and BIC when `lambda` is `"aic"` or `"bic"`. In a standard situation, it is equal to the sample size - 1, but it might be different when the data are observed nonsynchronously and/or with noise. If missing, it is automatically determined according to `method`.
`thr.type`	a character string indicating the type of the thresholding method used when `regularize = "thresholding"`. Possible choices are `"hard"`, `"soft"`, `"alasso"` and `"scad"`. See Section 2.3 of Dai et al. (2019) for the definition of each method.
`thr`	a numeric matrix indicating the threshold levels used when `regularize = "thresholding"`. Its entries indicate the threshold levels for the corresponding entries of the covariance matrix (values for λ in the notation of Dai et al. (2019)). A single number is converted to the matrix with common entries equal to that number. If `NULL`, it is determined according to `tau`. See ‘Details’.
`tau`	a number between 0 and 1 used to determine the threshold levels used when `regularize = "thresholding"` and `thr=NULL` (a value for τ in the notation of Dai et al. (2019)). If `NULL`, it is determined by a grid search procedure as suggested in Section 4.3 of Dai et al. (2019). See ‘Details’.
`par.alasso`	the tuning parameter for `thr.type = "alasso"` (a value for η in the notation of Dai et al. (2019)).
`par.scad`	the tuning parameter for `thr.type = "scad"` (a value for a in the notation of Dai et al. (2019)).
`thr.delta`	a positive number indicating the step size used in the grid serach procedure to determine `tau`.
`frequency`	passed to `cce`.
`utime`	passed to `cce`.
`...`	passed to `cce`.

Details

One basic approach to estimate the covariance matrix of high-dimensional time series is to take account of the factor structure and perform regularization for the residual covariance matrix. This function implements such an estimation procedure for high-frequency data modeled as a discretely observed semimartingale. Specifically, let Y be a d-dimensional semimartingale which describes the dynamics of the observation data. We consider the following continuous-time factor model:

Y_t = β X_t + Z_t, 0≤ t≤ T,

where X is an r-dimensional semimartingale (the factor process), Z is a d-dimensional semimartingale (the residual process), and β is a constant d\times r matrix (the factor loading matrix). We assume that X and Z are orthogonal in the sense that [X,Z]_T=0. Then, the quadratic covariation matrix of Y is given by

[Y,Y]_T=β[X,X]_Tβ^\top+[Z,Z]_T.

Also, β can be written as β=[Y,X]_T[X,X]_T^{-1}. Thus, if we have observation data both for Y and X, we can construct estimators for [Y,Y]_T, [X,X]_T and β by cce. Moreover, plugging these estimators into the above equation, we can also construct an estimator for [Z,Z]_T. Since this estimator is often poor due to the high-dimensionality, we regularize it by some method. Then, by plugging the regularized estimator for [Z,Z]_T into the above equation, we obtain the final estimator for [Y,Y]_T.

Even if we do not have observation data for X, we can (at least formally) construct a pseudo factor process by performing principal component analysis for the initial estimator of [Y,Y]_T. See Ait-Sahalia and Xiu (2017) and Dai et al. (2019) for details.

Currently, the following four options are available for the regularization method applied to the residual covariance matrix estimate:

regularize = "glasso" (the default).

This performs the glaphical Lasso. When weight=TRUE (the default), a weighted version of the graphical Lasso is performed as in Koike (2020). Otherwise, the standard graphical Lasso is performed as in Brownlees et al. (2018).

If lambda="aic" (resp.~lambda="bic"), the penalty parameter for the graphical Lasso is selected by minimizing the formally defined AIC (resp.~BIC). The minimization is carried out by grid search, where the grid is determined as in Section 5.1 of Koike (2020).

The optimization problem in the graphical Lasso is solved by the GLASSOFAST algorithm of Sustik and Calderhead (2012), which is available from the package glassoFast.
regularize = "tapering".

This performs tapering, i.e. taking the entry-wise product of the residual covariance matrix estimate and a tapering matrix specified by taper. See Section 3.5.1 of Pourahmadi (2011) for an overview of this method.

If taper is missing, it is constructed according to group as follows: taper is a 0-1 matrix and the (i,j)-th entry is equal to 1 if and only if group[i]==group[j]. Thus, by default it makes the residual covariance matrix diagonal.
regularize = "thresholding".

This performs thresholding, i.e. entries of the residual covariance matrix are shrunk toward 0 according to a thresholding rule (specified by thr.type) and a threshold level (spencified by thr).

If thr=NULL, the (i,j)-th entry of thr is given by τ√{\hat{[Z^i,Z^i]}_T\hat{[Z^j,Z^j]}_T}, where \hat{[Z^i,Z^i]}_T (resp. \hat{[Z^j,Z^j]}_T) denotes the i-th (resp. j-th) diagonal entry of the non-regularized estimator for the residual covariance matrix [Z,Z]_T, and τ is a tuning parameter specified by tau.

When tau=NULL, the value of τ is set to the smallest value in the grid with step size thr.delta such that the regularized estimate of the residual covariance matrix becomes positive definite.
regularize = "eigen.cleaning".

This performs the eigenvalue cleaning algorithm described in Hautsch et al. (2012).

Value

A list with components:

`covmat.y`	the estimated covariance matrix
`premat.y`	the estimated precision matrix
`beta.hat`	the estimated factor loading matrix
`covmat.x`	the estimated factor covariance matrix
`covmat.z`	the estimated residual covariance matrix
`premat.z`	the estimated residual precision matrix
`sigma.z`	the estimated residual covariance matrix before regularization
`pc`	the variances of the principal components (it is `NULL` if `PCA = FALSE`)

Author(s)

Yuta Koike with YUIMA project Team

References

Ait-Sahalia, Y. and Xiu, D. (2017). Using principal component analysis to estimate a high dimensional factor model with high-frequency data, Journal of Econometrics, 201, 384–399.

Brownlees, C., Nualart, E. and Sun, Y. (2018). Realized networks, Journal of Applied Econometrics, 33, 986–1006.

Dai, C., Lu, K. and Xiu, D. (2019). Knowing factors or factor loadings, or neither? Evaluating estimators of large covariance matrices with noisy and asynchronous data, Journal of Econometrics, 208, 43–79.

Hautsch, N., Kyj, L. M. and Oomen, R. C. (2012). A blocking and regularization approach to high-dimensional realized covariance estimation, Journal of Applied Econometrics, 27, 625–645.

Koike, Y. (2020). De-biased graphical Lasso for high-frequency data, Entropy, 22, 456.

Pourahmadi, M. (2011). Covariance estimation: The GLM and regularization perspectives. Statistical Science, 26, 369–387.

Sustik, M. A. and Calderhead, B. (2012). GLASSOFAST: An efficient GLASSO implementation, UTCSTechnical Report TR-12-29, The University of Texas at Austin.

Examples

## Not run: 
set.seed(123)

## Simulating a factor process (Heston model)
drift <- c("mu*S", "-theta*(V-v)")
diffusion <- matrix(c("sqrt(max(V,0))*S", "gamma*sqrt(max(V,0))*rho",
                       0, "gamma*sqrt(max(V,0))*sqrt(1-rho^2)"),
                    2,2)
mod <- setModel(drift = drift, diffusion = diffusion,
                state.variable = c("S", "V")) 
n <- 2340
samp <- setSampling(n = n)
heston <- setYuima(model = mod, sampling = samp)
param <- list(mu = 0.03, theta = 3, v = 0.09, 
              gamma = 0.3, rho = -0.6) 
result <- simulate(heston, xinit = c(1, 0.1), 
                   true.parameter = param)

zdata <- get.zoo.data(result) # extract the zoo data
X <- log(zdata[[1]]) # log-price process
V <- zdata[[2]] # squared volatility process


## Simulating a residual process (correlated BM)
d <- 100 # dimension
Q <- 0.1 * toeplitz(0.7^(1:d-1)) # residual covariance matrix
dZ <- matrix(rnorm(n*d),n,d) %*% chol(Q)/sqrt(n)
Z <- zoo(apply(dZ, 2, "diffinv"), samp@grid[[1]])


## Constructing observation data
b <- runif(d, 0.25, 2.25) # factor loadings
Y <- X %o% b + Z
yuima <- setData(cbind(X, Y))

# We subsample yuima to construct observation data
yuima <- subsampling(yuima, setSampling(n = 78))


## Estimating the covariance matrix (factor is known)
cmat <- tcrossprod(b) * mean(V[-1]) + Q # true covariance matrix
pmat <- solve(cmat) # true precision matrix

# (1) Regularization method is glasso (the default)
est <- cce.factor(yuima, factor = 1) 
norm(est$covmat.y - cmat, type = "2") 
norm(est$premat.y - pmat, type = "2")

# (2) Regularization method is tapering
est <- cce.factor(yuima, factor = 1, regularize = "tapering") 
norm(est$covmat.y - cmat, type = "2") 
norm(est$premat.y - pmat, type = "2")

# (3) Regularization method is thresholding
est <- cce.factor(yuima, factor = 1, regularize = "thresholding") 
norm(est$covmat.y - cmat, type = "2") 
norm(est$premat.y - pmat, type = "2")

# (4) Regularization method is eigen.cleaning
est <- cce.factor(yuima, factor = 1, regularize = "eigen.cleaning") 
norm(est$covmat.y - cmat, type = "2") 
norm(est$premat.y - pmat, type = "2")


## Estimating the covariance matrix (factor is unknown)
yuima2 <- setData(Y)

# We subsample yuima to construct observation data
yuima2 <- subsampling(yuima2, setSampling(n = 78))

# (A) Ignoring the factor structure (regularize = "glasso")
est <- cce.factor(yuima2) 
norm(est$covmat.y - cmat, type = "2") 
norm(est$premat.y - pmat, type = "2")

# (B) Estimating the factor by PCA (regularize = "glasso")
est <- cce.factor(yuima2, PCA = TRUE, nfactor = 1) # use 1 factor 
norm(est$covmat.y - cmat, type = "2") 
norm(est$premat.y - pmat, type = "2")

# One can interactively select the number of factors
# after implementing PCA (the scree plot is depicted)
# Try: est <- cce.factor(yuima2, PCA = TRUE)

## End(Not run)

yuima documentation built on Nov. 14, 2022, 3:02 p.m.

yuima index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

yuima
The YUIMA Project Package for SDEs

cce.factor: High-Dimensional Cumulative Covariance Estimator by Factor...
In yuima: The YUIMA Project Package for SDEs

High-Dimensional Cumulative Covariance Estimator by Factor Modeling and Regularization

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to cce.factor in yuima...

R Package Documentation

Browse R Packages

We want your feedback!

yuima The YUIMA Project Package for SDEs

cce.factor: High-Dimensional Cumulative Covariance Estimator by Factor... In yuima: The YUIMA Project Package for SDEs

High-Dimensional Cumulative Covariance Estimator by Factor Modeling and Regularization

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to cce.factor in yuima...

R Package Documentation

Browse R Packages

We want your feedback!

yuima
The YUIMA Project Package for SDEs

cce.factor: High-Dimensional Cumulative Covariance Estimator by Factor...
In yuima: The YUIMA Project Package for SDEs