To install mhawkes
package, first install devtools
.
# install.packages("devtools") #if devtools is not installed
Install mhawkes
package from github.
# devtools::install_github("ksublee/mhawkes")
Load mhawkes
.
library("mhawkes")
Let's start with simple example.
For exemplary purposes, one can simulate a one dimensional Hawkes model by simply calling mhsim
.
mhsim()
The model parameters are set to default with MU = 0.2
, ALPHA = 1.5
and BETA = 2
.
In addition, mark distribution is a constant function and all mark sizes are 1.
This subsection explaines how to construct, simulate, and estimate a one dimensional Hawkes model. Basically the Hawkes model can be defined up to 9 dimension in this package but currently fully supported for one and two diemsional model. More precisely, the simulation works well for high dimension, but for estimation procedure, one or two dimesnional model is recommended. As the dimension increases, the parameter increases and the estimation result can not be guaranteed.
First, create a mhspec
which defines the Hawkes model.
S4 class mhspec
contains slots of model parameters, MU
, ALPHA
, BETA
, ETA
and mark
.
The parameters of the model, the slots of mhspec
, is defined by matrices
but setting as numeric values are also supported for one dimesional model.
For more than one dimensional model, the parameters should be defined by matrices (not vectors).
The following is an example of one dimensional Hawkes model (without mark).
Parameter inputs can be a numeric value or 1-by-1 matrix.
The simulation by numeric values is little bit faster than the matrix-based simulation.
In the following case, mark
and ETA
slots, which deteremine the mark size and impact of the mark,
are ommited and set to be default values.
set.seed(1107) MU1 <- 0.3; ALPHA1 <- 1.2; BETA1 <- 1.5 mhspec1 <- new("mhspec", MU=MU1, ALPHA=ALPHA1, BETA=BETA1) show(mhspec1)
To simulate a path, use function mhsim
, where n
is the number of observations.
res1 <- mhsim(mhspec1, n=5000)
The output res1
is an S3-object of mhreal
and a list of inter_arrival
, arrival
, mark_type
, mark
,N
, Ng
, lambda
, and lambda_component
.
Among those inter_arrival
, arrival
, mark_type
, and mark
are numeric vectors
and N
, Ng
, lambda
, and lambda_component
are matrices.
slot | meaning
-------------------|---------------------------------
inter_arrival
| inter-arrival between events
arrival
| cumulative sum of inter_arrival
mark_type
| the dimension of realized mark
mark
| the size of mark
N
| the realization of Hawkes processs
Ng
| the ground process
lambda
| the intenisty process
lambda_component
| each component of the intensity process
lambda
and lambda_component
have the following relationship:
lambda = mu + rowSum(lambda_component)
Print the result:
res1
Or, use summary function.
summary(res1)
Note that the inter_arrival
, arrival
, Ng
and N
start at zero.
Thus, inter_arrival[2]
and arrival[2]
are first arrival times of event.
Since the model is the Hawkes process without mark, Ng
and N
are equal.
Ng
is the ground process, a counting process without mark and hence only counts the number of events.
In a one dimensional model, lambda = mu + lambda_component
.
About lambda_component
in higher-order models are discussed in the next subsection.
Simle way to plot the realized processes:
plot(res1$arrival[1:20], res1$N[,'N1'][1:20], 's', xlab="t", ylab="N")
Intensity process can be plotted by plot_lambda
function.
Note that BETA
should be provided as an argument to describe the exponential decaying.
plot_lambda(res1$arrival[1:20], res1$N[,'N1'][1:20], mhspec1@BETA)
The log-likelihood function is computed by logLik
method.
In this case, the inter-arrival times and mhspec
are inputs of the function.
logLik(mhspec1, inter_arrival = res1$inter_arrival)
The likelihood estimation is performed using mhfit
function.
The specification of the initial values of the parameters, mhspec0
is needed.
In the following example, mhspec0
is set to be mhspec1
, which is defined previously, for simplicity, but any candidates for the starting value of the numerical procedure can be used.
Note that only arrival
or inter_arrival
is needed.
(Indeed, for more precise simulation, LAMBDA0
, the inital value of lambda compoment, should be specified. If not, internally determined initial values are set.)
mhspec0 <- mhspec1 mle <- mhfit(mhspec0, inter_arrival = res1$inter_arrival) summary(mle)
One can omitt mhspec
but it is recommended that you provide a starting values.
summary(mhfit(inter_arrival = res1$inter_arrival))
For the numerical procedure, maxLik
function of maxLik
package is used with BFGS
method.
Any optimization method supported by maxLik
can be used.
In addition,
summary(mhfit(mhspec0, inter_arrival = res1$inter_arrival, method = "NR"))
Mark structure can be added with mark
slot in mhspec
.
mark
slot is a function that generates marks.
Marks can be constants or random variables.
In addition, linear impact parameter ETA
can be added.
The linear impact function means that when the realized jump size is k
, then the impact is porpotional to 1 +(k-1)ETA
.
In the following, the mark follows geometric distribution.
ETA1 <- 0.15 mark_function <- function(n,...) rgeom(n, 0.65) + 1 mhspec1 <- new("mhspec", MU=MU1, ALPHA=ALPHA1, BETA=BETA1, ETA=ETA1, mark=mark_function)
res1 <- mhsim(mhspec1, n=10)
Plot the realized processes.
plot(res1$arrival, res1$N[,'N1'], 's', xlab="t", ylab="N")
For a simple example, one can simulate a two-dimensional Hawkes process with default setting.
mhsim(dimens=2)
The default parameters are set to be MU = matrix(c(0.2), nrow = 2)
, ALPHA = matrix(c(0.7, 0.9, 0.9, 0.7), byrow=TRUE)
and BETA = matrix(c(2, 2, 2, 2), nrow = 2)
.
In two dimensional model, the parameters, the slots of mhspec
, are matrices.
MU
is 2-by-1, and ALPHA, BETA, ETA
are 2-by-2 matrices.
mark
is a random number generating function.
LAMBDA0
, 2-by-2 matrix, represents the initial values of lambda_component
, a set of lambda11, lambda12, lambda21, lambda22
.
The intensity processes are represented by
$$ \lambda_1(t) = \mu_1 + \lambda_{11}(t) + \lambda_{12}(t) $$
$$ \lambda_2(t) = \mu_2 + \lambda_{21}(t) + \lambda_{22}(t) $$
$\lambda_{ij}$ called lambda components and LAMBDA0
is the time zero values of $lambda_{ij}$, i.e., $\lambda_{ij}(0)$.
LAMBDA0
can be omitted and then internally determined initial values are used.
MU2 <- matrix(c(0.2), nrow = 2) ALPHA2 <- matrix(c(0.75, 0.92, 0.92, 0.75), nrow = 2, byrow = TRUE) BETA2 <- matrix(c(2.25, 2.25, 2.25, 2.25), nrow = 2, byrow = TRUE) ETA2 <- matrix(c(0.19, 0.19, 0.19, 0.19), nrow = 2, byrow = TRUE) mark_fun <- function(n,...) rgeom(n, 0.65) + 1 LAMBDA0 <- matrix(c(0.1, 0.1, 0.1, 0.1), nrow = 2, byrow = TRUE) mhspec2 <- new("mhspec", MU=MU2, ALPHA=ALPHA2, BETA=BETA2, ETA=ETA2, mark =mark_fun)
mhspec2
To simulate, use function mhsim
.
The result is `mhreal
as in one dimensional case.
res2 <- mhsim(mhspec2, n=5000) summary(res2)
Plot the Hawkes process, N1, using plot
function.
plot(res2$arrival[1:20], res2$N[1:20,1], 's', xlab = "t", ylab = "N1")
Plot the intensity process, lambda1
, using plot_lambda
function.
plot_lambda(res2$arrival[1:20], res2$lambda[1:20,1], beta = BETA2[1,1], xlab = "t", ylab = "lambda1")
Frome the result, we get a vector of realized inter_arrival
, mark
, and mark_type
.
In one-dimensional case, inter_arrival
is only needed to represent the realized path.
Unlike the one-dimensional case, the two-dimensional case requires inter_arrival
, mark
, and mark_type
.
mark
is a vector of mark size and mark_type
represensts the type of events, in this case, 1 or 2.
inter_arrival2 <- res2$inter_arrival mark2 <- res2$mark mark_type2 <- res2$mark_type
Log-likelihood is computed by a function logLik
.
logLik(mhspec2, inter_arrival = inter_arrival2, mark_type = mark_type2, mark = mark2)
A log-likelihood estimation is performed using mhfit
.
In the following, the values of parameter slots in mhspec0
, such as MU, ALPHA, BETA
, are regarded as a starting point of the numerical optimization.
For simplicity, we use mhspec0 <- mhspec2
.
Since the true parameter values are not known in the actual problem, the initial value should be guessed.
The realized inter_arrival
and mark_type
are used.
mhspec0 <- mhspec2 mle <- mhfit(mhspec0, inter_arrival = inter_arrival2, mark_type = mark_type2, mark = mark2) summary(mle)
One can estimate the Hawkes model using N
and inter_arrival
instead of inter_arrival
, mark_type
, and mark
with the same result.
summary(mhfit(mhspec0, inter_arrival = inter_arrival2, N = res2$N))
Although the initial value, mhspec0
can be omitted, but it is recommended to set an appropriate value for mhspec0
.
summary(mhfit(inter_arrival = inter_arrival2, mark_type = mark_type2, mark = mark2))
This subsection explains about the relation between parameter setting and estimation procedure in two-dimensional Hawkes model.
The number of parameters to be estimated in the model depends on how we set the parameter slots such as ALPHA
and BETA
in mhspec0
, the sepcification for initial values..
Since the paremeter slot such as ALPHA
is a matrix, and the element in the matrix can be the same or different.
The number of parameters in the estimation varies depending on whether or not some of the elements in the initial setting are the same or different.
For example, if ALPHA[1,1]
and ALPHA[1,2]
in mhspec0
are different,
the numerical procedure tries to estimate both parameters of ALPHA[1,1]
and ALPHA[1,2]
.
If ALPHA[1,1]
and ALPHA[1,2]
are the same in the initial setting,
then the estimation procedure considered two parameters are the same in the model and hence only one of them is estimated.
The following is an typical example of a symmetric Hawkes model.
Simulate a path first to apply mhfit
in the later.
MU2 <- matrix(c(0.2, 0.2), nrow = 2) ALPHA2 <- matrix(c(0.75, 0.90, 0.90, 0.75), nrow = 2, byrow=TRUE) BETA2 <- matrix(c(2.5, 2.5, 2.5, 2.5), nrow = 2, byrow=TRUE) ETA2 <- matrix(c(0.19, 0.19, 0.19, 0.19), nrow = 2, byrow=TRUE) mark_function <- function(n,...) rgeom(n, 0.65) + 1 mhspec2 <- new("mhspec", MU=MU2, ALPHA=ALPHA2, BETA=BETA2, ETA=ETA2, mark = mark_function) res2 <- mhsim(mhspec2, n=1000)
Now res2
is the simulated path of a two dimensional Hawkes process and we want to estimate Hawkes models based on res2
.
In the first example of estimation, ALPHA0
is a matrix where the all elements have the same value, 0.75.
In this setting, mhfit
considers that alpha11 == alpha12 == alpha21 == alpha22
in the model
(even though the actual parameters have different values).
Similarly for other parmater matrix MU0
, BETA0
and ETA0
.
Therefore, only four parameters mu1, alpha11, beta11, eta11
will be estimated.
MU0 <- matrix(c(0.15, 0.15), nrow = 2) ALPHA0 <- matrix(c(0.75, 0.75, 0.75, 0.75), nrow = 2, byrow=TRUE) BETA0 <- matrix(c(2.6, 2.6, 2.6, 2.6), nrow = 2, byrow=TRUE) ETA0 <- matrix(c(0.2, 0.2, 0.2, 0.2), nrow = 2, byrow=TRUE) mark_function <- function(n,...) rgeom(n, 0.65) + 1 mhspec0 <- new("mhspec", MU=MU0, ALPHA=ALPHA0, BETA=BETA0, ETA=ETA0, mark = mark_function) summary(mhfit(mhspec0, arrival = res2$arrival, N = res2$N))
In the second example, ALPHA0
's elements are not same, but symmetric as in the original simulation.
We have alpha11 == alpha22
and alpha11 == alpha22
in ALPHA0
and hence
alpha11
and alpha12
will be estimated.
MU0 <- matrix(c(0.15, 0.15), nrow = 2) ALPHA0 <- matrix(c(0.75, 0.751, 0.751, 0.75), nrow = 2, byrow=TRUE) BETA0 <- matrix(c(2.6, 2.6, 2.6, 2.6), nrow = 2, byrow=TRUE) ETA0 <- matrix(c(0.2, 0.2, 0.2, 0.2), nrow = 2, byrow=TRUE) mark_function <- function(n,...) rgeom(n, 0.65) + 1 mhspec0 <- new("mhspec", MU=MU0, ALPHA=ALPHA0, BETA=BETA0, ETA=ETA0, mark = mark_function) summary(mhfit(mhspec0, arrival = res2$arrival, N = res2$N))
In the third example, all ALPHA0
have different values and hence all alpha11, alpha12, alpha21, alpha22
will be estimated.
MU0 <- matrix(c(0.15, 0.15), nrow = 2) ALPHA0 <- matrix(c(0.75, 0.751, 0.752, 0.753), nrow = 2, byrow=TRUE) BETA0 <- matrix(c(2.6, 2.6, 2.6, 2.6), nrow = 2, byrow=TRUE) ETA0 <- matrix(c(0.2, 0.2, 0.2, 0.2), nrow = 2, byrow=TRUE) mark_function <- function(n,...) rgeom(n, 0.65) + 1 mhspec0 <- new("mhspec", MU=MU0, ALPHA=ALPHA0, BETA=BETA0, ETA=ETA0, mark = mark_function) summary(mhfit(mhspec0, arrival = res2$arrival, N = res2$N))
Sometimes, it is better to use a user-defined likelihood function.
For example, if one sure that the every element of ETA
is equal to 0.2, and hence fix ETA
during the estimation procedure,
then the modified user defined likelihood function can be used as the following.
my_llh <- function(inter_arrival, mark_type, mark){ function(param){ MU0 <- matrix(rep(param[1], 2), nrow=2) ALPHA0 <- matrix(c(param[2], param[3], param[3], param[2]), nrow = 2, byrow=TRUE) BETA0 <- matrix(rep(param[4], 4), nrow = 2, byrow=TRUE) ETA0 <- matrix(rep(0.2, 4), nrow=2) mhspec0 <- new("mhspec", MU=MU0, ALPHA=ALPHA0, BETA=BETA0, ETA=ETA0) mhawkes::logLik(mhspec0, inter_arrival = inter_arrival, mark_type = mark_type, mark = mark) } } my_llh_fun <- my_llh(inter_arrival=res2$inter_arrival, mark_type=res2$mark_type, mark=res2$mark) summary(maxLik::maxLik(logLik = my_llh_fun, start=c(mu1=0.15, alpha11=0.75, alpha12=0.8, beta=2.6), method="BFGS"))
The mark distribution can depend on other underlying processes.
For example, consider a marked Hawkes model with conditional geometric distribution marks and
the distribution depends on the current value of the intensity process.
In the following, lambda
is a matrix, and mark_type
is a numeric vector, i
denotes the time index.
dependent_mark <- function(n, i, lambda = lambda, mark_type = mark_type, ...){ c <- 0.15 d <- 1 U <- 2 p <- 1 / min(d + c*lambda[i, mark_type[i]], U) rgeom(n, p) + 1 } MU2 <- matrix(c(0.2, 0.2), nrow = 2) ALPHA2 <- matrix(c(0.75, 0.90, 0.90, 0.75), nrow = 2, byrow=TRUE) BETA2 <- matrix(c(2.5, 2.5, 2.5, 2.5), nrow = 2, byrow=TRUE) ETA2 <- matrix(c(0.19, 0.19, 0.19, 0.19), nrow = 2, byrow=TRUE) mhspec2 <- new("mhspec", MU=MU2, ALPHA=ALPHA2, BETA=BETA2, ETA=ETA2, mark=dependent_mark) summary(res2 <- mhsim(mhspec2, n=5000))
summary(mhfit(mhspec2, arrival = res2$arrival, N = res2$N))
This section provides an example using financial tick data. The financial tick data is a typical example of the marked Hawkes model. The following data is a time series of stock prices traded on the New York Stock Exchange.
head(tick_price)
plot(tick_price[,"t"], tick_price[,"price"], 's', xlab = "t", ylab = "price")
The tick price process is represented by two dimenstional Hawkes process, up and down movements. The following is an estimation result of the symmetric Hawkes model.
mu <- 0.1; alpha_s <- 0.7; alpha_c = 0.5; beta <- 2 mhspec0 <- new("mhspec", MU=matrix(rep(mu,2), nrow=2), ALPHA=matrix(c(alpha_s, alpha_c, alpha_c, alpha_s), nrow=2), BETA=matrix(rep(beta, 4), nrow=2), ETA=matrix(rep(0, 4), nrow=2)) arrival <- tick_price[, "t"] price <- tick_price[, "price"] mark <- c(0, abs((price[-1] - price[-length(price)])/0.005)) #0.005 is minimum tick size mark_type <- c(0, ifelse((price[-1] - price[-length(price)]) > 0 , 1, 2)) mle <- mhfit(mhspec0, arrival = arrival, mark = mark, mark_type = mark_type) summary(mle)
The (approximated) annualized volatility of the price process is computed by var_diff
function.
To do that, first we define a new mhspec
from estimates by the above maximum likelihood estimation.
Note that the mark distribution is defined by the empirical distribution with sample
function.
mu <- coef(mle)["mu1"]; alpha_s <- coef(mle)["alpha11"]; alpha_c <- coef(mle)["alpha12"] beta <- coef(mle)["beta11"]; eta <- coef(mle)["eta11"]; mhspec_estimate <- new("mhspec", MU=matrix(rep(mu,2), nrow=2), ALPHA=matrix(c(alpha_s, alpha_c, alpha_c, alpha_s), nrow=2), BETA=matrix(rep(beta, 4), nrow=2), ETA=matrix(rep(eta, 4), nrow=2), mark=function(n,empirical_mark = mark,...) sample(empirical_mark, n, replace=TRUE) )
In the following, time_length=60*60*5.5*252
by assuming trading time is 60*60*5.5
seconds in a day and the busyness days are 252 days in a year.
Because the volatility of return is more common than the volatiliy of price, (0.005/mean(price))^2
is multiplied.
The result 12.76% is annualised return volatility.
variance <- var_diff(mhspec_estimate, mean_jump=mean(mark), mean_jump_square=mean(mark^2), time_length=60*60*5.5*252)*(0.005/mean(price))^2 (annualized_volatility <- sqrt(variance))
In this package, one or two-dimensional model is recommended, but actually works adequately for models of more than three dimensions. The method for simulation and estimation is very similar to the previous one.
MU3 <- matrix(rep(0.2, 3), nrow = 3) ALPHA3 <- matrix(c(0.7, 0.50, 0.60, 0.50, 0.70, 0.40, 0.6, 0.4, 0.7), nrow = 3, byrow=TRUE) BETA3 <- matrix(rep(2.7, 9), nrow = 3, byrow=TRUE) ETA3 <- matrix(rep(0.2, 9), nrow = 3, byrow=TRUE) mark_function <- function(n,...) rgeom(n, 0.65) + 1 mhspec3 <- new("mhspec", MU=MU3, ALPHA=ALPHA3, BETA=BETA3, ETA=ETA3, mark = mark_function) res3 <- mhsim(mhspec3, n=1000, LAMBDA0 = matrix(rep(0.3, 9), nrow=3)) summary(res3)
mhspec0 <- mhspec3 summary(mhfit(mhspec3, arrival = res3$arrival, N = res3$N))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.