In bpvgoncalves/DynHedge: Dynamic Hedge Simulation

This document aims at analyse the use of simulated data series to find Dynamic Hedge alternatives.

Prepare data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(digits = 5)

We start by loading the package which provides some data and useful functions.

library(DynHedge)
library(compiler)

symbol <- load_data_idx()
symbol$Symbol <- paste0(symbol$Symbol, ".MC")

data(market_data)

Inspecting data availability and format prior to start modelling.

head(symbol, n = 4)
ls(envir = market_data)
tail(market_data$ACS.MC, n = 4)

We then need to extract the closing prices from the series. We can also compute the proxy index.

closing <- get_close_data(market_data)
tail(closing, n = 4)

index <- compute_index(closing, symbol$Weight)
tail(index, n = 4)

Index Weights are not current!

They represent the actual weights as of the beginning of the project, but do not take into account at least the last 2 rebalances due to corporate actions.

Simulation

We can now set the simulation environment up to simulate the different data series. It is important taking into account that the series are not independent but they are all correlated instead.

We'll start by computing daily performance and get the correlation matrix between all the series.

We can assume the daily price change for a given stock follow a stochastic process with independent increments that can be described as a Lévy Process. A Geometric Brownian Motion process will be used (Wiener Process).

Since we need to simulate several correlated series a technique based on Cholesky's decomposition of the covariance matrix can be applied.

# computing log variation os daily prices
closing_var <- na.exclude(log(closing / lag(closing)))

# computing anualized average returns and std.dev.
closing_var_avg <- colMeans(closing_var) * 252
closing_var_std <- sapply(closing_var, sd) * sqrt(252)

correl <- cor(closing_var)
covar <- cov(closing_var)
closing_last <- closing[nrow(closing), ]

set.seed(888)
num_steps <- 5000
sim_1 <- simulate_series(num_steps, closing_last, closing_var_avg, closing_var_std, covar)
sim_1_var <- log(sim_1[2:num_steps, ]/sim_1[1:num_steps - 1, ])
correl_1 <- cor(sim_1_var)

correl_diff_1 <- correl_1 - correl

When a big number of steps are simulated, simulation results maintain the correlation level on the returns of the different series in line with the results from the sample data:

Actual data correlation ranges between r min(correl) and r max(correl);
Simulated data correlation ranges between r min(correl_1) and r max(correl_1);
Differences between simulated and actual correlation across the entire matrix are between r min(correl_diff_1) and r max(correl_diff_1). Average difference is r mean(correl_diff_1).

It is also possible assume the daily price changes follow a stochastic process know as Jump Diffusion. In this case the price is modeled by two different processes, a Gaussian one that ensures the diffusion paths (the same as the GBM used before) and a Poisson process to generate the discrete jumps at random points in time.

sim_2 <- simulate_series(num_steps, closing_last, closing_var_avg, 
                         closing_var_std, covar, TRUE, 0.01, 1/5)
sim_2_var <- log(sim_2[2:num_steps, ]/sim_2[1:num_steps - 1, ])
correl_2 <- cor(sim_2_var)

correl_diff_2 <- correl_2 - correl

This gives correlations between r min(correl_2) and r max(correl_2), with average, minimum and maximum difference across the matrix of r mean(correl_diff_2) , r min(correl_diff_2) and r max(correl_diff_2) respectively.

As expected, due to the noise introduced by the jumps, the correlations in this scenario are lower than the ones in the actual data.

After computing a given number of steps for each stock, we can also compute the proxy index with the simulated data paths. Here only the first 252 steps for the index computed from the shares' paths from each of the simulation types above will be computed and plotted for illustration purposes.

sim_idx1 <- rbind(index[nrow(index)],
                  compute_index(sim_1[1:252, ], symbol$Weight))
sim_idx2 <- rbind(index[nrow(index)],
                  compute_index(sim_2[1:252, ], symbol$Weight))

par(mar = c(6.75, 3, 3, 3))
plot(sim_idx1, type = "l", col = "red", 
     axes = FALSE,
     main = "Simulated Indexes",
     ylab = "Index Points",
     xlab = "Step #",
     ylim = c(min(sim_idx1, sim_idx2),
              max(sim_idx1, sim_idx2)))
lines(sim_idx2, col = "blue")
legend(x = "bottom",
       inset = c(0, -0.5),
       legend = c("Simulation 1", "Simulation 2"), 
       col = c("red", "blue"),
       lwd = 3,
       xpd = TRUE,
       horiz = TRUE)
axis(1)
axis(2)

Hedging Strategies

Unrestricted

Simple optimization process, without any major restrictions.

set.seed(888, "L'Ecuyer-CMRG")

num_scenarios <- 10000
num_steps <- 25 # aprox 1 month
scenarios <- list()
for (s in 1:num_scenarios){
  scn <- simulate_series(num_steps, closing_last, closing_var_avg, 
                         closing_var_std, covar, TRUE, 0.01, 1/5)
  scenarios[[s]] <- as.data.frame(scn)
}

fit_target <- function(x01, x02, x03, x04, x05, x06, x07, x08, x09, x10,
                       x11, x12, x13, x14, x15, x16, x17, x18, x19, x20,
                       x21, x22, x23, x24, x25, x26, x27, x28, x29, x30,
                       x31, x32, x33, x34, x35,
                       S0, I0, St, It) {

  ga_w <- as.matrix(c(x01, x02, x03, x04, x05, x06, x07, x08, x09, x10,
                      x11, x12, x13, x14, x15, x16, x17, x18, x19, x20,
                      x21, x22, x23, x24, x25, x26, x27, x28, x29, x30,
                      x31, x32, x33, x34, x35))
  S0 <- as.matrix(S0)
  I0 <- as.matrix(I0)
  St <- as.matrix(St)
  It <- as.matrix(It)

  pnl <- (St %*% ga_w - It) - as.numeric(S0 %*% ga_w - I0)
  invisible(mean(abs(pnl)))
}
fit_pnl <- cmpfun(fit_target)

hed1 <- simulate_hedge(fit_pnl,
                       scenarios, 
                       closing_last, 
                       index[nrow(index)], 
                       c(20000, 250), 
                       c(300, 10000))

summary(hed1[[2]])

Not unexpectedly, this unrestricted model converged to close the actual number of shares present in each unit of index.

solution1 <- as.data.frame(cbind(symbol[, 5], t(hed1[[2]]@solution)))
colnames(solution1) <- c("Actual", "Model")
solution1$DiffShares <- solution1$Model - solution1$Actual
solution1$DiffPerc <- solution1$DiffShares / solution1$Actual

print(solution1)

summary(solution1)

Restricted Optimization

In this scenario several dimensions are taken into account for the cost function we're trying to minimize.

P&L, linear, each euro of positive or negative P&L equals +1 unit on the cost function;
Delta, in euro, at strategy inception. Each euro of long or short delta equals +1 unit on the cost function;
Reducing the number of used shares. Scales with the square of the number of used actions so that using each additional share costs marginally more on the cost function. For instance, moving from 10 to 11 shares leads to an increase of r 11 ** 2 - 10 ** 2 points on the cost function, but increasing from 30 to 31 would increase the cost function by r 31 ** 2 - 30 ** 2 points.

set.seed(888, "L'Ecuyer-CMRG")

fit_target <- function(x01, x02, x03, x04, x05, x06, x07, x08, x09, x10,
                       x11, x12, x13, x14, x15, x16, x17, x18, x19, x20,
                       x21, x22, x23, x24, x25, x26, x27, x28, x29, x30,
                       x31, x32, x33, x34, x35,
                       S0, I0, St, It) {

  ga_w <- as.matrix(c(x01, x02, x03, x04, x05, x06, x07, x08, x09, x10,
                      x11, x12, x13, x14, x15, x16, x17, x18, x19, x20,
                      x21, x22, x23, x24, x25, x26, x27, x28, x29, x30,
                      x31, x32, x33, x34, x35))
  S0 <- as.matrix(S0)
  I0 <- as.matrix(I0)
  St <- as.matrix(St)
  It <- as.matrix(It)

  # several dimensions are taken into account for the cost function:
  pnl <- (St %*% ga_w - It) - as.numeric(S0 %*% ga_w - I0)
  starting_delta <- as.numeric(S0 %*% ga_w - I0)
  num_shares <- sum(ifelse(abs(ga_w) < 0.001, 0, 1))

  invisible(mean(abs(pnl)) + abs(starting_delta) + num_shares ** 2 )
}
fit_complex <- cmpfun(fit_target)

hed2 <- simulate_hedge(fit_complex,
                       scenarios, 
                       closing_last, 
                       index[nrow(index)], 
                       c(100000, 300), 
                       c(50, 1000))

summary(hed2[[2]])

Under this restricted scenario the model converged to some different weights and, more notably, some less relevant shares were excluded from the replication portfolio.

ifelse(abs(hed2[[2]]@solution) < 0.001, 0, hed2[[2]]@solution)

Testing

set.seed(888, "L'Ecuyer-CMRG")

num_scenarios <- 1500
num_steps <- 25 # aprox 1 month
pl <- NULL

idx_t <- 1:num_scenarios
for (s in 1:num_scenarios){
  scn <- simulate_series(num_steps, closing_last, closing_var_avg, 
                         closing_var_std, covar, TRUE, 0.01, 1/5)
  idx <- compute_index(scn, symbol$Weight)
  idx_t[s] <- idx[num_steps]
  bsk <- compute_index(scn, t(hed2[[2]]@solution))
  pl <- cbind(pl, bsk-idx)
}

hist(idx_t, main = "Distribution of final Index", col = "cyan")
boxplot(pl, main = "Simulation errors")

Reproducibility

Analysis run on the following environment:

sessionInfo()

Further improvements

Genetic Algorithms are known for being able to provide accurate yet not precise answers to maximization/minimization problems, even when several local maximums/minimums are present at the solutions space.

As a further enhancement point, once a GA gets us close to the solution other optimization methods (such as the ones based on gradient/derivatives) may be used to keep optimizing the solutions and converge to a more precise result.