cce: Nonsynchronous Cumulative Covariance Estimator
In yuima: The YUIMA Project Package for SDEs

cce	R Documentation

Nonsynchronous Cumulative Covariance Estimator

Description

This function estimates the covariance between two Ito processes when they are observed at discrete times possibly nonsynchronously. It can apply to irregularly sampled one-dimensional data as a special case.

Usage

cce(x, method="HY", theta, kn, g=function(x)min(x,1-x), refreshing = TRUE,
    cwise = TRUE, delta = 0, adj = TRUE, K, c.two, J = 1, c.multi, kernel, H,
    c.RK, eta = 3/5, m = 2, ftregion = 0, vol.init = NA,
    covol.init = NA, nvar.init = NA, ncov.init = NA, mn, alpha = 0.4,
    frequency = 300, avg = TRUE, threshold, utime, psd = FALSE)

Arguments

`x`	an object of `yuima-class` or `yuima.data-class`.
`method`	the method to be used. See ‘Details’.
`theta`	a numeric vector or matrix. If it is a matrix, each of its components indicates the tuning parameter which determines the pre-averaging window lengths `kn` to be used for estimating the corresponding component. If it is a numeric vector, it is converted to a matrix as `(C+t(C))/2`, where `C=matrix(theta,d,d)` and `d=dim(x)`. The default value is `0.15` for the method `"PHY"` or `"PTHY"` following Christensen et al. (2013), while it is `1` for the method `"MRC"` following Christensen et al. (2010).
`kn`	an integer-valued vector or matrix indicating the pre-averaging window length(s). For the methods `"PHY"` or `"PTHY"`, see ‘Details’ for the default value. For the method `"MRC"`, the default value is `ceiling(theta*n^(1+delta))`, where `n` is the number of the refresh times associated with the data minus 1.
`g`	a function indicating the weight function to be used. The default value is the Bartlett window: `function(x)min(x,1-x)`.
`refreshing`	logical. If `TRUE`, the data is pre-synchronized by the next-tick interpolation in the refresh times.
`cwise`	logical. If `TRUE`, the estimator is calculated componentwise.
`delta`	a non-negative number indicating the order of the pre-averaging window length(s) `kn`.
`adj`	logical. If `TRUE`, a finite-sample adjustment is performed. For the method `"MRC"`, see Christensen et al. (2010) for details. For the method `"TSCV"`, see Zhang (2011) and Zhang et al. (2005) for details.
`K`	a positive integer indicating the large time-scale parameter. The default value is `ceiling(c.two*n^(2/3))`, where `n` is the number of the refresh times associated with the data minus 1.
`c.two`	a positive number indicating the tuning parameter which determines the scale of the large time-scale parameter `K`. The default value is the average of the numeric vector each of whose components is the roughly estimated optimal value in the sense of the minimizer of the theoretical asymptotic variance of the estimator of the corresponding diagonal component. The theoretical asymptotic variance is considered in the standard case and given by Eq.(63) of Zhang et al. (2005).
`J`	a positive integer indicating the small time-scale parameter.
`c.multi`	a numeric vector or matrix. If it is a matrix, each of its components indicates the tuning parameter which determines (the scale of) the number of the time scales to be used for estimating the corresponding component. If it is a numeric vector, it is converted to a matrix as `(C+t(C))/2`, where `C=matrix(c.multi,d,d)` and `d=dim(x)`. The default value is the numeric vector each of whose components is the roughly estimated optimal value in the sense of minimizing the theoretical asymptotic variance of the estimator of the corresponding diagonal component. The theoretical asymptotic variance is considered in the standard case and given by Eq.(37) of Zhang (2006).
`kernel`	a function indicating the kernel function to be used. The default value is the Parzan kernel, which is recommended in Barndorff-Nielsen et al. (2009, 2011).
`H`	a positive number indicating the bandwidth parameter. The default value is `c.RK*n^eta`, where `n` is the number of the refresh times associated with the data minus 1.
`c.RK`	a positive number indicating the tuning parameter which determines the scale of the bandwidth parameter `H`. The default value is the average of the numeric vector each of whose components is the roughly estimated optimal value in the sense of minimizing the theoretical asymptotic variance of the estimator of the corresponding diagonal component. The theoretical asymptotic variance is considered in the standard case and given in Barndorff-Nielsen et al. (2009, 2011).
`eta`	a positive number indicating the tuning parameter which determines the order of the bandwidth parameter `H`.
`m`	a positive integer indicating the number of the end points to be jittered.
`ftregion`	a non-negative number indicating the length of the flat-top region. `ftregion=0` (the default) means that a non-flat-top realized kernel studied in Barndorff-Nielsen et al. (2011) is used. `ftregion=1/H` means that a flat-top realized kernel studied in Barndorff-Nielsen et al. (2008) is used. See Varneskov (2015) for other values.
`vol.init`	a numeric vector each of whose components indicates the initial value to be used to estimate the integrated volatility of the corresponding component, which is passed to the optimizer.
`covol.init`	a numeric matrix each of whose columns indicates the initial value to be used to estimate the integrated covariance of the corresponding component, which is passed to the optimizer.
`nvar.init`	a numeric vector each of whose components indicates the initial value to be used to estimate the variance of noise of the corresponding component, which is passed to the optimizer.
`ncov.init`	a numeric matrix each of whose columns indicates the initial value to be used to estimate the covariance of noise of the corresponding component, which is passed to the optimizer.
`mn`	a positive integer indicating the number of terms to be used for calculating the SIML estimator. The default value is `ceiling(n^alpha)`, where `n` is the number of the refresh times associated with the data minus 1.
`alpha`	a postive number indicating the order of `mn`.
`frequency`	a positive integer indicating the frequency (seconds) of the calendar time sampling to be used.
`avg`	logical. If `TRUE`, the averaged subsampling estimator is calculated. Otherwise the simple sparsely subsampled estimator is calculated.
`threshold`	a numeric vector or list indicating the threshold parameter(s). Each of its components indicates the threshold parameter or process to be used for estimating the corresponding component. If it is a numeric vector, the elements in `threshold` are recycled if there are two few elements in `threshold`. The default value is determined following Koike (2014) (for the method `"THY"`) and Koike (2015) (for the method `"PTHY"`).
`utime`	a positive number indicating how many seconds the unit time corresponds to. Usually, we should set `utime=1` when applied to real intraday high-frequency data. The default value is the difference between the maximum and the minimum of the sampling times, multiplied by 23,400. Here, 23,400 seconds correspond to 6.5 hours, hence if the data is sampled on the interval [0,1], then the sampling interval is regarded as 6.5 hours.
`psd`	logical. If `TRUE`, the estimated covariance matrix `C` is converted to `(C%*%C)^(1/2)` for ensuring the positive semi-definiteness. In this case the absolute values of the estimated correlations are always ensured to be less than or equal to 1.

Details

This function is a method for objects of yuima.data-class and yuima-class. It extracts the data slot when applied to a an object of yuima-class.

Typical usages are

cce(x,psd=FALSE)
cce(x,method="PHY",theta,kn,g,refreshing=TRUE,cwise=TRUE,psd=FALSE)
cce(x,method="MRC",theta,kn,g,delta=0,avg=TRUE,psd=FALSE)
cce(x,method="TSCV",K,c.two,J=1,adj=TRUE,utime,psd=FALSE)
cce(x,method="GME",c.multi,utime,psd=FALSE)
cce(x,method="RK",kernel,H,c.RK,eta=3/5,m=2,ftregion=0,utime,psd=FALSE)
cce(x,method="QMLE",vol.init=NULL,covol.init=NULL,
    nvar.init=NULL,ncov.init=NULL,psd=FALSE)
cce(x,method="SIML",mn,alpha=0.4,psd=FALSE)
cce(x,method="THY",threshold,psd=FALSE)
cce(x,method="PTHY",theta,kn,g,threshold,refreshing=TRUE,cwise=TRUE,psd=FALSE)
cce(x,method="SRC",frequency=300,avg=TRUE,utime,psd=FALSE)
cce(x,method="SBPC",frequency=300,avg=TRUE,utime,psd=FALSE)

The default method is method "HY", which is an implementation of the Hayashi-Yoshida estimator proposed in Hayashi and Yoshida (2005).

Method "PHY" is an implementation of the Pre-averaged Hayashi-Yoshida estimator proposed in Christensen et al. (2010).

Method "MRC" is an implementation of the Modulated Realized Covariance based on refresh time sampling proposed in Christensen et al. (2010).

Method "TSCV" is an implementation of the previous tick Two Scales realized CoVariance based on refresh time sampling proposed in Zhang (2011).

Method "GME" is an implementation of the Generalized Multiscale Estimator proposed in Bibinger (2011).

Method "RK" is an implementation of the multivariate Realized Kernel based on refresh time sampling proposed in Barndorff-Nielsen et al. (2011).

Method "QMLE" is an implementation of the nonparametric Quasi Maximum Likelihood Estimator proposed in Ait-Sahalia et al. (2010).

Method "SIML" is an implementation of the Separating Information Maximum Likelihood estimator proposed in Kunitomo and Sato (2013) with the basis of refresh time sampling.

Method "THY" is an implementation of the Truncated Hayashi-Yoshida estimator proposed in Mancini and Gobbi (2012).

Method "PTHY" is an implementation of the Pre-averaged Truncated Hayashi-Yoshida estimator, which is a thresholding version of the pre-averaged Hayashi-Yoshida estimator.

Method "SRC" is an implementation of the calendar time Subsampled Realized Covariance.

Method "SBPC" is an implementation of the calendar time Subsampled realized BiPower Covariation.

The rough estimation procedures for selecting the default values of the tuning parameters are based on those in Barndorff-Nielsen et al. (2009).

For the methods "PHY" or "PTHY", the default value of kn changes depending on the values of refreshing and cwise. If both refreshing and cwise are TRUE (the default), the default value of kn is given by the matrix ceiling(theta*N), where N is a matrix whose diagonal components are identical with the vector length(x)-1 and whose (i,j)-th component is identical with the number of the refresh times associated with i-th and j-th components of x minus 1. If refreshing is TRUE while cwise is FALSE, the default value of kn is given by ceiling(mean(theta)*sqrt(n)), where n is the number of the refresh times associated with the data minus 1. If refreshing is FALSE while cwise is TRUE, the default value of kn is given by the matrix ceiling(theta*N0), where N0 is a matrix whose diagonal components are identical with the vector length(x)-1 and whose (i,j)-th component is identical with (length(x)[i]-1)+(length(x)[j]-1). If both refreshing and cwise are FALSE, the default value of kn is given by ceiling(mean(theta)*sqrt(sum(length(x)-1))) (following Christensen et al. (2013)).

For the method "QMLE", the optimization of the quasi-likelihood function is implemented via arima0 using the fact that it can be seen as an MA(1) model's one: See Hansen et al. (2008) for details.

Value

A list with components:

`covmat`	the estimated covariance matrix
`cormat`	the estimated correlation matrix

Note

The example shows the central limit theorem for the nonsynchronous covariance estimator. Estimation of the asymptotic variance can be implemented by hyavar. The second-order correction will be provided in a future version of the package.

Author(s)

Yuta Koike with YUIMA Project Team

References

Ait-Sahalia, Y., Fan, J. and Xiu, D. (2010) High-frequency covariance estimates with noisy and asynchronous financial data, Journal of the American Statistical Association, 105, no. 492, 1504–1517.

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A. and Shephard, N. (2008) Designing realised kernels to measure the ex-post variation of equity prices in the presence of noise, Econometrica, 76, no. 6, 1481–1536.

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A. and Shephard, N. (2009) Realized kernels in practice: trades and quotes, Econometrics Journal, 12, C1–C32.

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A. and Shephard, N. (2011) Multivariate realised kernels: Consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading, Journal of Econometrics, 162, 149–169.

Bibinger, M. (2011) Efficient covariance estimation for asynchronous noisy high-frequency data, Scandinavian Journal of Statistics, 38, 23–45.

Bibinger, M. (2012) An estimator for the quadratic covariation of asynchronously observed Ito processes with noise: asymptotic distribution theory, Stochastic processes and their applications, 122, 2411–2453.

Christensen, K., Kinnebrock, S. and Podolskij, M. (2010) Pre-averaging estimators of the ex-post covariance matrix in noisy diffusion models with non-synchronous data, Journal of Econometrics, 159, 116–133.

Christensen, K., Podolskij, M. and Vetter, M. (2013) On covariation estimation for multivariate continuous Ito semimartingales with noise in non-synchronous observation schemes, Journal of Multivariate Analysis 120 59–84.

Hansen, P. R., Large, J. and Lunde, A. (2008) Moving average-based estimators of integrated variance, Econometric Reviews, 27, 79–111.

Hayashi, T. and Yoshida, N. (2005) On covariance estimation of non-synchronously observed diffusion processes, Bernoulli, 11, no. 2, 359–379.

Hayashi, T. and Yoshida, N. (2008) Asymptotic normality of a covariance estimator for nonsynchronously observed diffusion processes, Annals of the Institute of Statistical Mathematics, 60, no. 2, 367–406.

Koike, Y. (2016) Estimation of integrated covariances in the simultaneous presence of nonsynchronicity, microstructure noise and jumps, Econometric Theory, 32, 533–611.

Koike, Y. (2014) An estimator for the cumulative co-volatility of asynchronously observed semimartingales with jumps, Scandinavian Journal of Statistics, 41, 460–481.

Kunitomo, N. and Sato, S. (2013) Separating information maximum likelihood estimation of realized volatility and covariance with micro-market noise, North American Journal of Economics and Finance, 26, 282–309.

Mancini, C. and Gobbi, F. (2012) Identifying the Brownian covariation from the co-jumps given discrete observations, Econometric Theory, 28, 249–273.

Varneskov, R. T. (2016) Flat-top realized kernel estimation of quadratic covariation with non-synchronous and noisy asset prices, Journal of Business & Economic Statistics, 34, no.1, 1–22.

Zhang, L. (2006) Efficient estimation of stochastic volatility using noisy observations: a multi-scale approach, Bernoulli, 12, no.6, 1019–1043.

Zhang, L. (2011) Estimating covariation: Epps effect, microstructure noise, Journal of Econometrics, 160, 33–47.

Zhang, L., Mykland, P. A. and Ait-Sahalia, Y. (2005) A tale of two time scales: Determining integrated volatility with noisy high-frequency data, Journal of the American Statistical Association, 100, no. 472, 1394–1411.

Examples

## Not run: 
## Set a model
diff.coef.1 <- function(t, x1 = 0, x2 = 0) sqrt(1+t)
diff.coef.2 <- function(t, x1 = 0, x2 = 0) sqrt(1+t^2)
cor.rho <- function(t, x1 = 0, x2 = 0) sqrt(1/2)
diff.coef.matrix <- matrix(c("diff.coef.1(t,x1,x2)", 
"diff.coef.2(t,x1,x2) * cor.rho(t,x1,x2)", 
"", "diff.coef.2(t,x1,x2) * sqrt(1-cor.rho(t,x1,x2)^2)"), 2, 2) 
cor.mod <- setModel(drift = c("", ""), 
diffusion = diff.coef.matrix,solve.variable = c("x1", "x2")) 

set.seed(111) 

## We use a function poisson.random.sampling to get observation by Poisson sampling.
yuima.samp <- setSampling(Terminal = 1, n = 1200) 
yuima <- setYuima(model = cor.mod, sampling = yuima.samp) 
yuima <- simulate(yuima) 
psample<- poisson.random.sampling(yuima, rate = c(0.2,0.3), n = 1000) 

## cce takes the psample and returns an estimate of the quadratic covariation. 
cce(psample)$covmat[1, 2]
##cce(psample)[1, 2]

## True value of the quadratic covariation.
cc.theta <- function(T, sigma1, sigma2, rho) { 
  tmp <- function(t) return(sigma1(t) * sigma2(t) * rho(t)) 
	integrate(tmp, 0, T) 
}

theta <- cc.theta(T = 1, diff.coef.1, diff.coef.2, cor.rho)$value 
cat(sprintf("theta =%.5f\n", theta))

names(psample@zoo.data)






# Example. A stochastic differential equation with nonlinear feedback. 

## Set a model
drift.coef.1 <- function(x1,x2) x2
drift.coef.2 <- function(x1,x2) -x1
drift.coef.vector <- c("drift.coef.1","drift.coef.2")
diff.coef.1 <- function(t,x1,x2) sqrt(abs(x1))*sqrt(1+t)
diff.coef.2 <- function(t,x1,x2) sqrt(abs(x2))
cor.rho <- function(t,x1,x2) 1/(1+x1^2)
diff.coef.matrix <- matrix(c("diff.coef.1(t,x1,x2)", 
"diff.coef.2(t,x1,x2) * cor.rho(t,x1,x2)","", 
"diff.coef.2(t,x1,x2) * sqrt(1-cor.rho(t,x1,x2)^2)"), 2, 2) 
cor.mod <- setModel(drift = drift.coef.vector,
 diffusion = diff.coef.matrix,solve.variable = c("x1", "x2"))

## Generate a path of the process
set.seed(111) 
yuima.samp <- setSampling(Terminal = 1, n = 10000) 
yuima <- setYuima(model = cor.mod, sampling = yuima.samp) 
yuima <- simulate(yuima, xinit=c(2,3)) 
plot(yuima)


## The "true" value of the quadratic covariation.
cce(yuima)

## We use the function poisson.random.sampling to generate nonsynchronous 
## observations by Poisson sampling.
psample<- poisson.random.sampling(yuima, rate = c(0.2,0.3), n = 3000) 

## cce takes the psample to return an estimated value  of the quadratic covariation. 
## The off-diagonal elements are the value of the Hayashi-Yoshida estimator. 
cce(psample)




# Example. Epps effect for the realized covariance estimator

## Set a model
drift <- c(0,0)

sigma1 <- 1
sigma2 <- 1
rho <- 0.5

diffusion <- matrix(c(sigma1,sigma2*rho,0,sigma2*sqrt(1-rho^2)),2,2)

model <- setModel(drift=drift,diffusion=diffusion,
                  state.variable=c("x1","x2"),solve.variable=c("x1","x2"))
                  
## Generate a path of the latent process
set.seed(116)

## We regard the unit interval as 6.5 hours and generate the path on it 
## with the step size equal to 2 seconds

yuima.samp <- setSampling(Terminal = 1, n = 11700) 
yuima <- setYuima(model = model, sampling = yuima.samp) 
yuima <- simulate(yuima)

## We extract nonsynchronous observations from the path generated above 
## by Poisson random sampling with the average duration equal to 10 seconds

psample <- poisson.random.sampling(yuima, rate = c(1/5,1/5), n = 11700)

## Hayashi-Yoshida estimator consistetly estimates the true correlation
cce(psample)$cormat[1,2]

## If we synchronize the observation data on some regular grid 
## by previous-tick interpolations and compute the correlation 
## by therealized covariance based on such synchronized observations, 
## we underestimate the true correlation (known as the Epps effect). 
## This is illustrated by the following examples.

## Synchronization on the grid with 5 seconds steps
suppressWarnings(s1 <- cce(subsampling(psample, sampling = setSampling(n = 4680)))$cormat[1,2])
s1

## Synchronization on the grid with 10 seconds steps
suppressWarnings(s2 <- cce(subsampling(psample, sampling = setSampling(n = 2340)))$cormat[1,2])
s2

## Synchronization on the grid with 20 seconds steps
suppressWarnings(s3 <- cce(subsampling(psample, sampling = setSampling(n = 1170)))$cormat[1,2])
s3

## Synchronization on the grid with 30 seconds steps
suppressWarnings(s4 <- cce(subsampling(psample, sampling = setSampling(n = 780)))$cormat[1,2])
s4

## Synchronization on the grid with 1 minute steps
suppressWarnings(s5 <- cce(subsampling(psample, sampling = setSampling(n = 390)))$cormat[1,2])
s5

plot(zoo(c(s1,s2,s3,s4,s5),c(5,10,20,30,60)),type="b",xlab="seconds",ylab="correlation",
main = "Epps effect for the realized covariance")



# Example. Non-synchronous and noisy observations of a correlated bivariate Brownian motion

## Generate noisy observations from the model used in the previous example
Omega <- 0.005*matrix(c(1,rho,rho,1),2,2) # covariance matrix of noise
noisy.psample <- noisy.sampling(psample,var.adj=Omega)
plot(noisy.psample)

## Hayashi-Yoshida estimator: inconsistent
cce(noisy.psample)$covmat

## Pre-averaged Hayashi-Yoshida estimator: consistent
cce(noisy.psample,method="PHY")$covmat

## Generalized multiscale estimator: consistent
cce(noisy.psample,method="GME")$covmat

## Multivariate realized kernel: consistent
cce(noisy.psample,method="RK")$covmat

## Nonparametric QMLE: consistent
cce(noisy.psample,method="QMLE")$covmat

## End(Not run)

yuima documentation built on April 16, 2025, 5:12 p.m.