Install package

To install mhawkes package, first install devtools.

# install.packages("devtools")  #if devtools is not installed

Install mhawkes package from github.

# devtools::install_github("ksublee/mhawkes")

Load mhawkes.

library("mhawkes")

Let's start with simple example. For exemplary purposes, one can simulate a one dimensional Hawkes model by simply calling mhsim.

mhsim()

The model parameters are set to default with MU = 0.2, ALPHA = 1.5 and BETA = 2. In addition, mark distribution is a constant function and all mark sizes are 1.

One dimensional Hawkes process

This subsection explaines how to construct, simulate, and estimate a one dimensional Hawkes model. Basically the Hawkes model can be defined up to 9 dimension in this package but currently fully supported for one and two diemsional model. More precisely, the simulation works well for high dimension, but for estimation procedure, one or two dimesnional model is recommended. As the dimension increases, the parameter increases and the estimation result can not be guaranteed.

First, create a mhspec which defines the Hawkes model. S4 class mhspec contains slots of model parameters, MU, ALPHA, BETA, ETA and mark.

The parameters of the model, the slots of mhspec, is defined by matrices but setting as numeric values are also supported for one dimesional model. For more than one dimensional model, the parameters should be defined by matrices (not vectors).

The following is an example of one dimensional Hawkes model (without mark). Parameter inputs can be a numeric value or 1-by-1 matrix. The simulation by numeric values is little bit faster than the matrix-based simulation. In the following case, mark and ETA slots, which deteremine the mark size and impact of the mark, are ommited and set to be default values.

set.seed(1107)
MU1 <- 0.3; ALPHA1 <- 1.2; BETA1 <- 1.5
mhspec1 <- new("mhspec", MU=MU1, ALPHA=ALPHA1, BETA=BETA1)
show(mhspec1)

To simulate a path, use function mhsim, where n is the number of observations.

res1 <- mhsim(mhspec1,  n=5000)

The output res1 is an S3-object of mhreal and a list of inter_arrival, arrival, mark_type, mark,N, Ng, lambda, and lambda_component. Among those inter_arrival, arrival, mark_type, and mark are numeric vectors and N, Ng, lambda, and lambda_component are matrices.

slot | meaning -------------------|--------------------------------- inter_arrival| inter-arrival between events arrival | cumulative sum of inter_arrival mark_type | the dimension of realized mark mark | the size of mark N | the realization of Hawkes processs Ng | the ground process lambda | the intenisty process lambda_component | each component of the intensity process

lambda and lambda_component have the following relationship: lambda = mu + rowSum(lambda_component)

Print the result:

res1

Or, use summary function.

summary(res1)

Note that the inter_arrival, arrival, Ng and N start at zero. Thus, inter_arrival[2] and arrival[2] are first arrival times of event. Since the model is the Hawkes process without mark, Ng and N are equal. Ng is the ground process, a counting process without mark and hence only counts the number of events. In a one dimensional model, lambda = mu + lambda_component. About lambda_component in higher-order models are discussed in the next subsection.

Simle way to plot the realized processes:

plot(res1$arrival[1:20], res1$N[,'N1'][1:20], 's', xlab="t", ylab="N")

Intensity process can be plotted by plot_lambda function. Note that BETA should be provided as an argument to describe the exponential decaying.

plot_lambda(res1$arrival[1:20], res1$N[,'N1'][1:20], mhspec1@BETA)

The log-likelihood function is computed by logLik method. In this case, the inter-arrival times and mhspec are inputs of the function.

logLik(mhspec1, inter_arrival = res1$inter_arrival)

The likelihood estimation is performed using mhfit function. The specification of the initial values of the parameters, mhspec0 is needed. In the following example, mhspec0 is set to be mhspec1, which is defined previously, for simplicity, but any candidates for the starting value of the numerical procedure can be used.

Note that only arrival or inter_arrival is needed. (Indeed, for more precise simulation, LAMBDA0, the inital value of lambda compoment, should be specified. If not, internally determined initial values are set.)

mhspec0 <- mhspec1
mle <- mhfit(mhspec0, inter_arrival = res1$inter_arrival)
summary(mle)

One can omitt mhspec but it is recommended that you provide a starting values.

summary(mhfit(inter_arrival = res1$inter_arrival))

For the numerical procedure, maxLik function of maxLik package is used with BFGS method. Any optimization method supported by maxLik can be used. In addition,

summary(mhfit(mhspec0, inter_arrival = res1$inter_arrival, method = "NR"))

One dimensional Hawkes process with mark

Mark structure can be added with mark slot in mhspec. mark slot is a function that generates marks. Marks can be constants or random variables. In addition, linear impact parameter ETA can be added. The linear impact function means that when the realized jump size is k, then the impact is porpotional to 1 +(k-1)ETA. In the following, the mark follows geometric distribution.

ETA1 <- 0.15
mark_function <- function(n,...) rgeom(n, 0.65) + 1
mhspec1 <- new("mhspec", MU=MU1, ALPHA=ALPHA1, BETA=BETA1, ETA=ETA1, mark=mark_function)
res1 <- mhsim(mhspec1,  n=10)

Plot the realized processes.

plot(res1$arrival, res1$N[,'N1'], 's', xlab="t", ylab="N")

Two-dimensional Hawkes model

For a simple example, one can simulate a two-dimensional Hawkes process with default setting.

mhsim(dimens=2)

The default parameters are set to be MU = matrix(c(0.2), nrow = 2), ALPHA = matrix(c(0.7, 0.9, 0.9, 0.7), byrow=TRUE) and BETA = matrix(c(2, 2, 2, 2), nrow = 2).

In two dimensional model, the parameters, the slots of mhspec, are matrices. MU is 2-by-1, and ALPHA, BETA, ETA are 2-by-2 matrices. mark is a random number generating function. LAMBDA0, 2-by-2 matrix, represents the initial values of lambda_component, a set of lambda11, lambda12, lambda21, lambda22. The intensity processes are represented by

$$ \lambda_1(t) = \mu_1 + \lambda_{11}(t) + \lambda_{12}(t) $$

$$ \lambda_2(t) = \mu_2 + \lambda_{21}(t) + \lambda_{22}(t) $$

$\lambda_{ij}$ called lambda components and LAMBDA0 is the time zero values of $lambda_{ij}$, i.e., $\lambda_{ij}(0)$. LAMBDA0 can be omitted and then internally determined initial values are used.

MU2 <- matrix(c(0.2), nrow = 2)
ALPHA2 <- matrix(c(0.75, 0.92, 0.92, 0.75), nrow = 2, byrow = TRUE)
BETA2 <- matrix(c(2.25, 2.25, 2.25, 2.25), nrow = 2, byrow = TRUE)
ETA2 <- matrix(c(0.19, 0.19, 0.19, 0.19), nrow = 2, byrow = TRUE)
mark_fun <- function(n,...) rgeom(n, 0.65) + 1
LAMBDA0 <- matrix(c(0.1, 0.1, 0.1, 0.1), nrow = 2, byrow = TRUE)
mhspec2 <- new("mhspec", MU=MU2, ALPHA=ALPHA2, BETA=BETA2, ETA=ETA2, mark =mark_fun)
mhspec2

To simulate, use function mhsim. The result is `mhreal as in one dimensional case.

res2 <- mhsim(mhspec2,  n=5000)
summary(res2)

Plot the Hawkes process, N1, using plot function.

plot(res2$arrival[1:20], res2$N[1:20,1], 's', xlab = "t", ylab = "N1")

Plot the intensity process, lambda1, using plot_lambda function.

plot_lambda(res2$arrival[1:20], res2$lambda[1:20,1], beta = BETA2[1,1], xlab = "t", ylab = "lambda1")

Frome the result, we get a vector of realized inter_arrival, mark, and mark_type. In one-dimensional case, inter_arrival is only needed to represent the realized path. Unlike the one-dimensional case, the two-dimensional case requires inter_arrival, mark, and mark_type. mark is a vector of mark size and mark_type represensts the type of events, in this case, 1 or 2.

inter_arrival2 <- res2$inter_arrival
mark2 <- res2$mark
mark_type2 <- res2$mark_type

Log-likelihood is computed by a function logLik.

logLik(mhspec2, inter_arrival = inter_arrival2, mark_type = mark_type2, mark = mark2)

A log-likelihood estimation is performed using mhfit. In the following, the values of parameter slots in mhspec0, such as MU, ALPHA, BETA, are regarded as a starting point of the numerical optimization. For simplicity, we use mhspec0 <- mhspec2. Since the true parameter values are not known in the actual problem, the initial value should be guessed. The realized inter_arrival and mark_type are used.

mhspec0 <- mhspec2
mle <- mhfit(mhspec0, inter_arrival = inter_arrival2, mark_type = mark_type2, mark = mark2)
summary(mle)

One can estimate the Hawkes model using N and inter_arrival instead of inter_arrival, mark_type, and mark with the same result.

summary(mhfit(mhspec0, inter_arrival = inter_arrival2, N = res2$N))

Although the initial value, mhspec0 can be omitted, but it is recommended to set an appropriate value for mhspec0.

summary(mhfit(inter_arrival = inter_arrival2, mark_type = mark_type2, mark = mark2))

Parameter setting

This subsection explains about the relation between parameter setting and estimation procedure in two-dimensional Hawkes model. The number of parameters to be estimated in the model depends on how we set the parameter slots such as ALPHA and BETA in mhspec0, the sepcification for initial values.. Since the paremeter slot such as ALPHA is a matrix, and the element in the matrix can be the same or different. The number of parameters in the estimation varies depending on whether or not some of the elements in the initial setting are the same or different.

For example, if ALPHA[1,1] and ALPHA[1,2] in mhspec0 are different, the numerical procedure tries to estimate both parameters of ALPHA[1,1] and ALPHA[1,2]. If ALPHA[1,1] and ALPHA[1,2] are the same in the initial setting, then the estimation procedure considered two parameters are the same in the model and hence only one of them is estimated.

The following is an typical example of a symmetric Hawkes model. Simulate a path first to apply mhfit in the later.

MU2 <- matrix(c(0.2, 0.2), nrow = 2)
ALPHA2 <- matrix(c(0.75, 0.90, 0.90, 0.75), nrow = 2, byrow=TRUE)
BETA2 <- matrix(c(2.5, 2.5, 2.5, 2.5), nrow = 2, byrow=TRUE)
ETA2 <- matrix(c(0.19, 0.19, 0.19, 0.19), nrow = 2, byrow=TRUE)
mark_function <- function(n,...) rgeom(n, 0.65) + 1

mhspec2 <- new("mhspec", MU=MU2, ALPHA=ALPHA2, BETA=BETA2, ETA=ETA2, mark = mark_function)
res2 <- mhsim(mhspec2, n=1000)

Now res2 is the simulated path of a two dimensional Hawkes process and we want to estimate Hawkes models based on res2.

In the first example of estimation, ALPHA0 is a matrix where the all elements have the same value, 0.75. In this setting, mhfit considers that alpha11 == alpha12 == alpha21 == alpha22 in the model (even though the actual parameters have different values). Similarly for other parmater matrix MU0, BETA0 and ETA0. Therefore, only four parameters mu1, alpha11, beta11, eta11 will be estimated.

MU0 <- matrix(c(0.15, 0.15), nrow = 2)
ALPHA0 <- matrix(c(0.75, 0.75, 0.75, 0.75), nrow = 2, byrow=TRUE)
BETA0 <- matrix(c(2.6, 2.6, 2.6, 2.6), nrow = 2, byrow=TRUE)
ETA0 <- matrix(c(0.2, 0.2, 0.2, 0.2), nrow = 2, byrow=TRUE)
mark_function <- function(n,...) rgeom(n, 0.65) + 1

mhspec0 <- new("mhspec", MU=MU0, ALPHA=ALPHA0, BETA=BETA0, ETA=ETA0, mark = mark_function)
summary(mhfit(mhspec0, arrival = res2$arrival, N = res2$N))

In the second example, ALPHA0's elements are not same, but symmetric as in the original simulation. We have alpha11 == alpha22 and alpha11 == alpha22 in ALPHA0 and hence alpha11 and alpha12 will be estimated.

MU0 <- matrix(c(0.15, 0.15), nrow = 2)
ALPHA0 <- matrix(c(0.75, 0.751, 0.751, 0.75), nrow = 2, byrow=TRUE)
BETA0 <- matrix(c(2.6, 2.6, 2.6, 2.6), nrow = 2, byrow=TRUE)
ETA0 <- matrix(c(0.2, 0.2, 0.2, 0.2), nrow = 2, byrow=TRUE)
mark_function <- function(n,...) rgeom(n, 0.65) + 1

mhspec0 <- new("mhspec", MU=MU0, ALPHA=ALPHA0, BETA=BETA0, ETA=ETA0, mark = mark_function)
summary(mhfit(mhspec0, arrival = res2$arrival, N = res2$N))

In the third example, all ALPHA0 have different values and hence all alpha11, alpha12, alpha21, alpha22 will be estimated.

MU0 <- matrix(c(0.15, 0.15), nrow = 2)
ALPHA0 <- matrix(c(0.75, 0.751, 0.752, 0.753), nrow = 2, byrow=TRUE)
BETA0 <- matrix(c(2.6, 2.6, 2.6, 2.6), nrow = 2, byrow=TRUE)
ETA0 <- matrix(c(0.2, 0.2, 0.2, 0.2), nrow = 2, byrow=TRUE)
mark_function <- function(n,...) rgeom(n, 0.65) + 1

mhspec0 <- new("mhspec", MU=MU0, ALPHA=ALPHA0, BETA=BETA0, ETA=ETA0, mark = mark_function)
summary(mhfit(mhspec0, arrival = res2$arrival, N = res2$N))

User-defined likelihood function

Sometimes, it is better to use a user-defined likelihood function. For example, if one sure that the every element of ETA is equal to 0.2, and hence fix ETA during the estimation procedure, then the modified user defined likelihood function can be used as the following.

my_llh <- function(inter_arrival, mark_type, mark){

  function(param){
    MU0 <- matrix(rep(param[1], 2), nrow=2)
    ALPHA0 <- matrix(c(param[2], param[3], param[3], param[2]), nrow = 2, byrow=TRUE)
    BETA0 <- matrix(rep(param[4], 4), nrow = 2, byrow=TRUE)
    ETA0 <- matrix(rep(0.2, 4), nrow=2)

    mhspec0 <- new("mhspec", MU=MU0, ALPHA=ALPHA0, BETA=BETA0, ETA=ETA0)
    mhawkes::logLik(mhspec0, inter_arrival = inter_arrival, mark_type = mark_type, mark = mark)
  }

}

my_llh_fun <- my_llh(inter_arrival=res2$inter_arrival, mark_type=res2$mark_type, mark=res2$mark)
summary(maxLik::maxLik(logLik = my_llh_fun, 
                       start=c(mu1=0.15, alpha11=0.75, alpha12=0.8, beta=2.6), method="BFGS"))

Dependence structure for mark

The mark distribution can depend on other underlying processes. For example, consider a marked Hawkes model with conditional geometric distribution marks and the distribution depends on the current value of the intensity process. In the following, lambda is a matrix, and mark_type is a numeric vector, i denotes the time index.

dependent_mark <- function(n, i, lambda = lambda, mark_type = mark_type, ...){
  c <- 0.15
  d <- 1
  U <- 2
  p <- 1 / min(d + c*lambda[i, mark_type[i]], U)
  rgeom(n, p) + 1
}

MU2 <- matrix(c(0.2, 0.2), nrow = 2)
ALPHA2 <- matrix(c(0.75, 0.90, 0.90, 0.75), nrow = 2, byrow=TRUE)
BETA2 <- matrix(c(2.5, 2.5, 2.5, 2.5), nrow = 2, byrow=TRUE)
ETA2 <- matrix(c(0.19, 0.19, 0.19, 0.19), nrow = 2, byrow=TRUE)

mhspec2 <- new("mhspec", MU=MU2, ALPHA=ALPHA2, BETA=BETA2, ETA=ETA2, mark=dependent_mark)
summary(res2 <- mhsim(mhspec2, n=5000))
summary(mhfit(mhspec2, arrival = res2$arrival, N = res2$N))

Example with financial data

This section provides an example using financial tick data. The financial tick data is a typical example of the marked Hawkes model. The following data is a time series of stock prices traded on the New York Stock Exchange.

head(tick_price)
plot(tick_price[,"t"], tick_price[,"price"], 's', xlab = "t", ylab = "price")

The tick price process is represented by two dimenstional Hawkes process, up and down movements. The following is an estimation result of the symmetric Hawkes model.

mu <- 0.1; alpha_s <- 0.7; alpha_c = 0.5; beta <- 2

mhspec0 <- new("mhspec", MU=matrix(rep(mu,2), nrow=2), 
               ALPHA=matrix(c(alpha_s, alpha_c, alpha_c, alpha_s), nrow=2), 
               BETA=matrix(rep(beta, 4), nrow=2), 
               ETA=matrix(rep(0, 4), nrow=2))

arrival <- tick_price[, "t"]
price <- tick_price[, "price"]
mark <- c(0, abs((price[-1] - price[-length(price)])/0.005))  #0.005 is minimum tick size
mark_type <- c(0, ifelse((price[-1] - price[-length(price)]) > 0 , 1, 2))

mle <- mhfit(mhspec0, arrival = arrival, mark = mark, mark_type = mark_type)
summary(mle)

The (approximated) annualized volatility of the price process is computed by var_diff function. To do that, first we define a new mhspec from estimates by the above maximum likelihood estimation. Note that the mark distribution is defined by the empirical distribution with sample function.

mu <- coef(mle)["mu1"]; alpha_s <- coef(mle)["alpha11"]; alpha_c <- coef(mle)["alpha12"] 
beta <- coef(mle)["beta11"]; eta <- coef(mle)["eta11"];

mhspec_estimate <- new("mhspec", MU=matrix(rep(mu,2), nrow=2), 
               ALPHA=matrix(c(alpha_s, alpha_c, alpha_c, alpha_s), nrow=2), 
               BETA=matrix(rep(beta, 4), nrow=2), 
               ETA=matrix(rep(eta, 4), nrow=2),
               mark=function(n,empirical_mark = mark,...) sample(empirical_mark, n, replace=TRUE) )

In the following, time_length=60*60*5.5*252 by assuming trading time is 60*60*5.5 seconds in a day and the busyness days are 252 days in a year. Because the volatility of return is more common than the volatiliy of price, (0.005/mean(price))^2 is multiplied. The result 12.76% is annualised return volatility.

variance <- var_diff(mhspec_estimate, mean_jump=mean(mark), mean_jump_square=mean(mark^2), time_length=60*60*5.5*252)*(0.005/mean(price))^2
(annualized_volatility <- sqrt(variance))

Multi-dimensional Hawkes

In this package, one or two-dimensional model is recommended, but actually works adequately for models of more than three dimensions. The method for simulation and estimation is very similar to the previous one.

MU3 <- matrix(rep(0.2, 3), nrow = 3)
ALPHA3 <- matrix(c(0.7, 0.50, 0.60, 0.50, 0.70, 0.40, 0.6, 0.4, 0.7), nrow = 3, byrow=TRUE)
BETA3 <- matrix(rep(2.7, 9), nrow = 3, byrow=TRUE)
ETA3 <- matrix(rep(0.2, 9), nrow = 3, byrow=TRUE)
mark_function <- function(n,...) rgeom(n, 0.65) + 1

mhspec3 <- new("mhspec", MU=MU3, ALPHA=ALPHA3, BETA=BETA3, ETA=ETA3, mark = mark_function)
res3 <- mhsim(mhspec3, n=1000, LAMBDA0 = matrix(rep(0.3, 9), nrow=3))
summary(res3)
mhspec0 <- mhspec3
summary(mhfit(mhspec3, arrival = res3$arrival, N = res3$N))


ksublee/mhawkes documentation built on May 3, 2019, 9:40 p.m.