cce.factor | R Documentation |
This function estimates the covariance and precision matrices of a high-dimesnional Ito process by factor modeling and regularization when it is observed at discrete times possibly nonsynchronously with noise.
cce.factor(yuima, method = "HY", factor = NULL, PCA = FALSE, nfactor = "interactive", regularize = "glasso", taper, group = 1:(dim(yuima) - length(factor)), lambda = "bic", weight = TRUE, nlambda = 10, ratio, N, thr.type = "soft", thr = NULL, tau = NULL, par.alasso = 1, par.scad = 3.7, thr.delta = 0.01, frequency = 300, utime, ...)
yuima |
an object of |
method |
the method to be used in |
factor |
an integer or character vector indicating which components of |
PCA |
logical. If |
nfactor |
the number of factors constructed when |
regularize |
the regularizaton method to be used.
Possible choices are |
taper |
the tapering matrix used when |
group |
an integer vector having the length equal to |
lambda |
the penalty parameter used when |
weight |
logical. If |
nlambda |
a positive integer indicating the number of candidate penalty parameters for which AIC or BIC is evaluated when |
ratio |
a positive number indicating the ratio of the largest and smallest values in candidate penalty parameters for which AIC or BIC is evaluated when |
N |
a positive integer indicating the "effective" sampling size, which is necessary to evealuate AIC and BIC when |
thr.type |
a character string indicating the type of the thresholding method used when |
thr |
a numeric matrix indicating the threshold levels used when |
tau |
a number between 0 and 1 used to determine the threshold levels used when |
par.alasso |
the tuning parameter for |
par.scad |
the tuning parameter for |
thr.delta |
a positive number indicating the step size used in the grid serach procedure to determine |
frequency |
passed to |
utime |
passed to |
... |
passed to |
One basic approach to estimate the covariance matrix of high-dimensional time series is to take account of the factor structure and perform regularization for the residual covariance matrix. This function implements such an estimation procedure for high-frequency data modeled as a discretely observed semimartingale. Specifically, let Y be a d-dimensional semimartingale which describes the dynamics of the observation data. We consider the following continuous-time factor model:
Y_t = β X_t + Z_t, 0≤ t≤ T,
where X is an r-dimensional semimartingale (the factor process), Z is a d-dimensional semimartingale (the residual process), and β is a constant d\times r matrix (the factor loading matrix). We assume that X and Z are orthogonal in the sense that [X,Z]_T=0. Then, the quadratic covariation matrix of Y is given by
[Y,Y]_T=β[X,X]_Tβ^\top+[Z,Z]_T.
Also, β can be written as β=[Y,X]_T[X,X]_T^{-1}. Thus, if we have observation data both for Y and X, we can construct estimators for [Y,Y]_T, [X,X]_T and β by cce
. Moreover, plugging these estimators into the above equation, we can also construct an estimator for [Z,Z]_T. Since this estimator is often poor due to the high-dimensionality, we regularize it by some method. Then, by plugging the regularized estimator for [Z,Z]_T into the above equation, we obtain the final estimator for [Y,Y]_T.
Even if we do not have observation data for X, we can (at least formally) construct a pseudo factor process by performing principal component analysis for the initial estimator of [Y,Y]_T. See Ait-Sahalia and Xiu (2017) and Dai et al. (2019) for details.
Currently, the following four options are available for the regularization method applied to the residual covariance matrix estimate:
regularize = "glasso"
(the default).
This performs the glaphical Lasso. When weight=TRUE
(the default), a weighted version of the graphical Lasso is performed as in Koike (2020). Otherwise, the standard graphical Lasso is performed as in Brownlees et al. (2018).
If lambda="aic"
(resp.~lambda="bic"
), the penalty parameter for the graphical Lasso is selected by minimizing the formally defined AIC (resp.~BIC).
The minimization is carried out by grid search, where the grid is determined as in Section 5.1 of Koike (2020).
The optimization problem in the graphical Lasso is solved by the GLASSOFAST algorithm of Sustik and Calderhead (2012), which is available from the package glassoFast.
regularize = "tapering"
.
This performs tapering, i.e. taking the entry-wise product of the residual covariance matrix estimate and a tapering matrix specified by taper
.
See Section 3.5.1 of Pourahmadi (2011) for an overview of this method.
If taper
is missing, it is constructed according to group
as follows: taper
is a 0-1 matrix and the (i,j)-th entry is equal to 1 if and only if group[i]==group[j]
. Thus, by default it makes the residual covariance matrix diagonal.
regularize = "thresholding"
.
This performs thresholding, i.e. entries of the residual covariance matrix are shrunk toward 0 according to a thresholding rule (specified by thr.type
) and a threshold level (spencified by thr
).
If thr=NULL
, the (i,j)-th entry of thr
is given by τ√{\hat{[Z^i,Z^i]}_T\hat{[Z^j,Z^j]}_T}, where \hat{[Z^i,Z^i]}_T (resp. \hat{[Z^j,Z^j]}_T) denotes the i-th (resp. j-th) diagonal entry of the non-regularized estimator for the residual covariance matrix [Z,Z]_T, and τ is a tuning parameter specified by tau
.
When tau=NULL
, the value of τ is set to the smallest value in the grid with step size thr.delta
such that the regularized estimate of the residual covariance matrix becomes positive definite.
regularize = "eigen.cleaning"
.
This performs the eigenvalue cleaning algorithm described in Hautsch et al. (2012).
A list with components:
covmat.y |
the estimated covariance matrix |
premat.y |
the estimated precision matrix |
beta.hat |
the estimated factor loading matrix |
covmat.x |
the estimated factor covariance matrix |
covmat.z |
the estimated residual covariance matrix |
premat.z |
the estimated residual precision matrix |
sigma.z |
the estimated residual covariance matrix before regularization |
pc |
the variances of the principal components (it is |
Yuta Koike with YUIMA project Team
Ait-Sahalia, Y. and Xiu, D. (2017). Using principal component analysis to estimate a high dimensional factor model with high-frequency data, Journal of Econometrics, 201, 384–399.
Brownlees, C., Nualart, E. and Sun, Y. (2018). Realized networks, Journal of Applied Econometrics, 33, 986–1006.
Dai, C., Lu, K. and Xiu, D. (2019). Knowing factors or factor loadings, or neither? Evaluating estimators of large covariance matrices with noisy and asynchronous data, Journal of Econometrics, 208, 43–79.
Hautsch, N., Kyj, L. M. and Oomen, R. C. (2012). A blocking and regularization approach to high-dimensional realized covariance estimation, Journal of Applied Econometrics, 27, 625–645.
Koike, Y. (2020). De-biased graphical Lasso for high-frequency data, Entropy, 22, 456.
Pourahmadi, M. (2011). Covariance estimation: The GLM and regularization perspectives. Statistical Science, 26, 369–387.
Sustik, M. A. and Calderhead, B. (2012). GLASSOFAST: An efficient GLASSO implementation, UTCSTechnical Report TR-12-29, The University of Texas at Austin.
cce
, lmm
, glassoFast
## Not run: set.seed(123) ## Simulating a factor process (Heston model) drift <- c("mu*S", "-theta*(V-v)") diffusion <- matrix(c("sqrt(max(V,0))*S", "gamma*sqrt(max(V,0))*rho", 0, "gamma*sqrt(max(V,0))*sqrt(1-rho^2)"), 2,2) mod <- setModel(drift = drift, diffusion = diffusion, state.variable = c("S", "V")) n <- 2340 samp <- setSampling(n = n) heston <- setYuima(model = mod, sampling = samp) param <- list(mu = 0.03, theta = 3, v = 0.09, gamma = 0.3, rho = -0.6) result <- simulate(heston, xinit = c(1, 0.1), true.parameter = param) zdata <- get.zoo.data(result) # extract the zoo data X <- log(zdata[[1]]) # log-price process V <- zdata[[2]] # squared volatility process ## Simulating a residual process (correlated BM) d <- 100 # dimension Q <- 0.1 * toeplitz(0.7^(1:d-1)) # residual covariance matrix dZ <- matrix(rnorm(n*d),n,d) %*% chol(Q)/sqrt(n) Z <- zoo(apply(dZ, 2, "diffinv"), samp@grid[[1]]) ## Constructing observation data b <- runif(d, 0.25, 2.25) # factor loadings Y <- X %o% b + Z yuima <- setData(cbind(X, Y)) # We subsample yuima to construct observation data yuima <- subsampling(yuima, setSampling(n = 78)) ## Estimating the covariance matrix (factor is known) cmat <- tcrossprod(b) * mean(V[-1]) + Q # true covariance matrix pmat <- solve(cmat) # true precision matrix # (1) Regularization method is glasso (the default) est <- cce.factor(yuima, factor = 1) norm(est$covmat.y - cmat, type = "2") norm(est$premat.y - pmat, type = "2") # (2) Regularization method is tapering est <- cce.factor(yuima, factor = 1, regularize = "tapering") norm(est$covmat.y - cmat, type = "2") norm(est$premat.y - pmat, type = "2") # (3) Regularization method is thresholding est <- cce.factor(yuima, factor = 1, regularize = "thresholding") norm(est$covmat.y - cmat, type = "2") norm(est$premat.y - pmat, type = "2") # (4) Regularization method is eigen.cleaning est <- cce.factor(yuima, factor = 1, regularize = "eigen.cleaning") norm(est$covmat.y - cmat, type = "2") norm(est$premat.y - pmat, type = "2") ## Estimating the covariance matrix (factor is unknown) yuima2 <- setData(Y) # We subsample yuima to construct observation data yuima2 <- subsampling(yuima2, setSampling(n = 78)) # (A) Ignoring the factor structure (regularize = "glasso") est <- cce.factor(yuima2) norm(est$covmat.y - cmat, type = "2") norm(est$premat.y - pmat, type = "2") # (B) Estimating the factor by PCA (regularize = "glasso") est <- cce.factor(yuima2, PCA = TRUE, nfactor = 1) # use 1 factor norm(est$covmat.y - cmat, type = "2") norm(est$premat.y - pmat, type = "2") # One can interactively select the number of factors # after implementing PCA (the scree plot is depicted) # Try: est <- cce.factor(yuima2, PCA = TRUE) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.