This document aims at analyse the use of simulated data series to find Dynamic Hedge alternatives.
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(digits = 5)
We start by loading the package which provides some data and useful functions.
library(DynHedge) library(compiler) symbol <- load_data_idx() symbol$Symbol <- paste0(symbol$Symbol, ".MC") data(market_data)
Inspecting data availability and format prior to start modelling.
head(symbol, n = 4) ls(envir = market_data) tail(market_data$ACS.MC, n = 4)
We then need to extract the closing prices from the series. We can also compute the proxy index.
closing <- get_close_data(market_data) tail(closing, n = 4) index <- compute_index(closing, symbol$Weight) tail(index, n = 4)
Index Weights are not current!
They represent the actual weights as of the beginning of the project, but do not take into account at least the last 2 rebalances due to corporate actions.
We can now set the simulation environment up to simulate the different data series. It is important taking into account that the series are not independent but they are all correlated instead.
We'll start by computing daily performance and get the correlation matrix between all the series.
We can assume the daily price change for a given stock follow a stochastic process with independent increments that can be described as a Lévy Process. A Geometric Brownian Motion process will be used (Wiener Process).
Since we need to simulate several correlated series a technique based on Cholesky's decomposition of the covariance matrix can be applied.
# computing log variation os daily prices closing_var <- na.exclude(log(closing / lag(closing))) # computing anualized average returns and std.dev. closing_var_avg <- colMeans(closing_var) * 252 closing_var_std <- sapply(closing_var, sd) * sqrt(252) correl <- cor(closing_var) covar <- cov(closing_var) closing_last <- closing[nrow(closing), ] set.seed(888) num_steps <- 5000 sim_1 <- simulate_series(num_steps, closing_last, closing_var_avg, closing_var_std, covar) sim_1_var <- log(sim_1[2:num_steps, ]/sim_1[1:num_steps - 1, ]) correl_1 <- cor(sim_1_var) correl_diff_1 <- correl_1 - correl
When a big number of steps are simulated, simulation results maintain the correlation level on the returns of the different series in line with the results from the sample data:
Actual data correlation ranges between r min(correl)
and r max(correl)
;
Simulated data correlation ranges between r min(correl_1)
and r max(correl_1)
;
Differences between simulated and actual correlation across the entire matrix are between r min(correl_diff_1)
and r max(correl_diff_1)
. Average difference is r mean(correl_diff_1)
.
It is also possible assume the daily price changes follow a stochastic process know as Jump Diffusion. In this case the price is modeled by two different processes, a Gaussian one that ensures the diffusion paths (the same as the GBM used before) and a Poisson process to generate the discrete jumps at random points in time.
sim_2 <- simulate_series(num_steps, closing_last, closing_var_avg, closing_var_std, covar, TRUE, 0.01, 1/5) sim_2_var <- log(sim_2[2:num_steps, ]/sim_2[1:num_steps - 1, ]) correl_2 <- cor(sim_2_var) correl_diff_2 <- correl_2 - correl
This gives correlations between r min(correl_2)
and r max(correl_2)
, with average, minimum and maximum difference across the matrix of r mean(correl_diff_2)
, r min(correl_diff_2)
and r max(correl_diff_2)
respectively.
As expected, due to the noise introduced by the jumps, the correlations in this scenario are lower than the ones in the actual data.
After computing a given number of steps for each stock, we can also compute the proxy index with the simulated data paths. Here only the first 252 steps for the index computed from the shares' paths from each of the simulation types above will be computed and plotted for illustration purposes.
sim_idx1 <- rbind(index[nrow(index)], compute_index(sim_1[1:252, ], symbol$Weight)) sim_idx2 <- rbind(index[nrow(index)], compute_index(sim_2[1:252, ], symbol$Weight)) par(mar = c(6.75, 3, 3, 3)) plot(sim_idx1, type = "l", col = "red", axes = FALSE, main = "Simulated Indexes", ylab = "Index Points", xlab = "Step #", ylim = c(min(sim_idx1, sim_idx2), max(sim_idx1, sim_idx2))) lines(sim_idx2, col = "blue") legend(x = "bottom", inset = c(0, -0.5), legend = c("Simulation 1", "Simulation 2"), col = c("red", "blue"), lwd = 3, xpd = TRUE, horiz = TRUE) axis(1) axis(2)
Simple optimization process, without any major restrictions.
set.seed(888, "L'Ecuyer-CMRG") num_scenarios <- 10000 num_steps <- 25 # aprox 1 month scenarios <- list() for (s in 1:num_scenarios){ scn <- simulate_series(num_steps, closing_last, closing_var_avg, closing_var_std, covar, TRUE, 0.01, 1/5) scenarios[[s]] <- as.data.frame(scn) }
fit_target <- function(x01, x02, x03, x04, x05, x06, x07, x08, x09, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x30, x31, x32, x33, x34, x35, S0, I0, St, It) { ga_w <- as.matrix(c(x01, x02, x03, x04, x05, x06, x07, x08, x09, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x30, x31, x32, x33, x34, x35)) S0 <- as.matrix(S0) I0 <- as.matrix(I0) St <- as.matrix(St) It <- as.matrix(It) pnl <- (St %*% ga_w - It) - as.numeric(S0 %*% ga_w - I0) invisible(mean(abs(pnl))) } fit_pnl <- cmpfun(fit_target) hed1 <- simulate_hedge(fit_pnl, scenarios, closing_last, index[nrow(index)], c(20000, 250), c(300, 10000))
summary(hed1[[2]])
Not unexpectedly, this unrestricted model converged to close the actual number of shares present in each unit of index.
solution1 <- as.data.frame(cbind(symbol[, 5], t(hed1[[2]]@solution))) colnames(solution1) <- c("Actual", "Model") solution1$DiffShares <- solution1$Model - solution1$Actual solution1$DiffPerc <- solution1$DiffShares / solution1$Actual print(solution1) summary(solution1)
In this scenario several dimensions are taken into account for the cost function we're trying to minimize.
r 11 ** 2 - 10 ** 2
points on the cost function, but increasing from 30 to 31 would increase the cost function by r 31 ** 2 - 30 ** 2
points.set.seed(888, "L'Ecuyer-CMRG")
fit_target <- function(x01, x02, x03, x04, x05, x06, x07, x08, x09, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x30, x31, x32, x33, x34, x35, S0, I0, St, It) { ga_w <- as.matrix(c(x01, x02, x03, x04, x05, x06, x07, x08, x09, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x30, x31, x32, x33, x34, x35)) S0 <- as.matrix(S0) I0 <- as.matrix(I0) St <- as.matrix(St) It <- as.matrix(It) # several dimensions are taken into account for the cost function: pnl <- (St %*% ga_w - It) - as.numeric(S0 %*% ga_w - I0) starting_delta <- as.numeric(S0 %*% ga_w - I0) num_shares <- sum(ifelse(abs(ga_w) < 0.001, 0, 1)) invisible(mean(abs(pnl)) + abs(starting_delta) + num_shares ** 2 ) } fit_complex <- cmpfun(fit_target) hed2 <- simulate_hedge(fit_complex, scenarios, closing_last, index[nrow(index)], c(100000, 300), c(50, 1000))
summary(hed2[[2]])
Under this restricted scenario the model converged to some different weights and, more notably, some less relevant shares were excluded from the replication portfolio.
ifelse(abs(hed2[[2]]@solution) < 0.001, 0, hed2[[2]]@solution)
set.seed(888, "L'Ecuyer-CMRG") num_scenarios <- 1500 num_steps <- 25 # aprox 1 month pl <- NULL idx_t <- 1:num_scenarios for (s in 1:num_scenarios){ scn <- simulate_series(num_steps, closing_last, closing_var_avg, closing_var_std, covar, TRUE, 0.01, 1/5) idx <- compute_index(scn, symbol$Weight) idx_t[s] <- idx[num_steps] bsk <- compute_index(scn, t(hed2[[2]]@solution)) pl <- cbind(pl, bsk-idx) } hist(idx_t, main = "Distribution of final Index", col = "cyan") boxplot(pl, main = "Simulation errors")
Analysis run on the following environment:
sessionInfo()
Genetic Algorithms are known for being able to provide accurate yet not precise answers to maximization/minimization problems, even when several local maximums/minimums are present at the solutions space.
As a further enhancement point, once a GA gets us close to the solution other optimization methods (such as the ones based on gradient/derivatives) may be used to keep optimizing the solutions and converge to a more precise result.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.