In AxelCode-R/Master-Thesis: Does not matter.

Particle Swarm Optimization {#spso}

The PSO was developed by J. Kennedy as a global optimization method based on swarm intelligence and presented to the public in 1995 by Eberhart and Kennedy [@KeEb1995]. The original PSO was intended to resemble a flock of birds flying through the sky without collisions. Therefore, its first applications were found in particle physics to analyze moving particles in high-dimensional spaces, which the name Particle recalls. Later, it was adapted in Evolutionary Computation to exploit a set of potential solutions in high dimensions and to find the optima by cooperating with other particles in the swarm [@PaVr2002]. Since it does not require gradient information, it is easier to apply than other global optimization methods. It can find the optimum by considering only the result of the function to be optimized. This means that the function can be arbitrarily complex and it is still possible to reach the global optimum. Other advantages are extensibility, simplicity, and low computational costs, since only basic mathematical operators are used.

The Algorithm

Each particle $d$ with position $x_d$ moves in the search space $\mathbb{R}^N$ and has its own velocity $v_d$ and remembers its previous best position $p_{p,d}$. After each iteration, the velocity changes in the direction of the intrinsic velocity, the best previous position, and the global best position $p_g$ of all particles. A position change from $i$ to $i+1$ can be calculated by the following two equations [@PaVr2002]:

$$ \begin{aligned} v_d^{i+1} &= wv_d^{i} + c_p r_1^{i(d)} (p_{p,d}^i - x_d^i) + c_g r_2^{i(d)} (p_g^i - x_d^i) \ x_d^{i+1} &= x_d^i + v_d^{i+1}. \end{aligned} $$

Where $r_1^{i(d)}$ and $r_2^{i(d)}$ are uniformly distributed random numbers in $[0, 1]$. The cognitive parameter $c_p$ acts as a weighting of the direction to its previous best position of the particle. This contrasts with the social parameter $c_g$, which is a weighting of the direction to the global best position. The inertia weight $w$ is crucial for the convergence behavior by remembering part of its previous trajectory. A study reviewed in [@PaVr2002] showed that these parameters can be set to $c_p=c_g=0.5$ and $w$ should decrease from $1.2$ to $0$. However, some problems benefit from a more precise tuning of these parameters. To allow effortless translation to code, the above formula for $d = 1, 2, \cdots, D$ particles can be given in the following matrix notation:

$$ \begin{aligned} V^{i+1} &= w \cdot V^{i} + c_p \cdot (\vec{r}_1^{\,i} \cdot (P^i-X^i)^T)^T + c_g \cdot (\vec{r}_2^{\,i} \cdot (p_g^i - X^i)^T)^T \ X^{i+1} &= X^i + V^{i+1} \end{aligned} $$

with current positions $X \in \mathbb{R}^{N \times D}$, current velocities $V \in \mathbb{R}^{N \times D}$, previous best positions $P \in \mathbb{R}^{N \times D}$, and global best position $p_g \in \mathbb{R}^{N}$. The parameters $w$, $c_p$ and $c_g$ are stile scalars. The random numbers $r_1$ and $r_2$ are replaced by the vectors $\vec{r}_1$ and $\vec{r}_2$, in which each element is a uniformly distributed random number generated in $[0, 1]$. The first transpose is needed to multiply each random number element-wise with each column and the second transpose transforms it back to the format of $V$ and $X$.

The algorithm mentioned above is the first published variant of PSO and is therefore referred to as standard PSO in this thesis. There are also other names like global PSO, because the information is distributed globally over all particles in the swarm and some also call it the basic or original PSO.

`pso()` Function

In this section, a general PSO function is created that follows the structure of other optimization heuristics in R, in particular the existing PSO implementation from the R package pso. The key component of the problem is a objective function called fn(), which needs a vector that describes the position of one particle (e.g. weights) and returns a scalar that needs to be minimized. The other main parameter for the PSO function is par, which is a position of a particle used to derive the dimension of the problem and used as the initial position of one particle. The vector can contain only NA's, resulting in completely random starting positions. The last two arguments are lower and upper bounds (e.g. weights greater than 0 and less than 1). All other parameters have default values that can be overridden by passing a list called control. The resulting structure is:

pso <- function(
    par, 
    fn, 
    lower, 
    upper, 
    control = list()
  ){}

Before the main data structure can be initialized, some sample inputs must be created for the pso() function as described below:

par <- rep(NA, 2)
fn <- function(x){return(sum(abs(x)))}
lower <- -10
upper <- 10
control = list(
  s = 10, # swarm size
  c.p = 0.5, # inherit best
  c.g = 0.5, # global best
  maxiter = 100, # iterations
  w0 = 1.2, # starting inertia weight
  wN = 0, # ending inertia weight
  save_traces = F # save more information
)

Now it is time to initialize the random positions X, their fitness X_fit and their random velocities V with the created function mrunif() which produces a matrix of uniformly distributed random numbers between lower and upper:

set.seed(0)
X <- mrunif(
  nr = length(par), nc=control$s, lower=lower, upper=upper
)
if(all(!is.na(par))){
  X[, 1] <- par
}
X_fit <- apply(X, 2, fn)
V <- mrunif(
  nr = length(par), nc=control$s, 
  lower=-(upper-lower), upper=(upper-lower)
)/10

The velocities are compressed by a factor of 10 to start with a maximum movement of one tenth of the space in each axis. The personal best positions P are the same as X and the global best position is the position with the smallest fitness:

P <- X
P_fit <- X_fit
p_g <- P[, which.min(P_fit)]
p_g_fit <- min(P_fit)

The required data structure is available and the optimization can start with the calculation of the new velocities and the transformation of the old positions. When particles have left the valid space of an axis, they are pushed back to the edge and the velocity on this axis is set to zero. Then the fitness is calculated and the personal best and global best positions are saved if they have improved.

trace_data <- NULL
for(i in 1:control$maxiter){
  # move particles
  V <- 
    (control$w0-(control$w0-control$wN)*i/control$maxiter) * V + 
      control$c.p * t(runif(ncol(X)) * t(P-X)) +
      control$c.g * t(runif(ncol(X)) * t(p_g-X))
  X <- X + V

  # set velocity to zeros if not in valid space
  V[X > upper] <- 0
  V[X < lower] <- 0

  # move into valid space
  X[X > upper] <- upper
  X[X < lower] <- lower

  # evaluate objective function
  X_fit <- apply(X, 2, fn)

  # save new previews best
  P[, P_fit > X_fit] <- X[, P_fit > X_fit]
  P_fit[P_fit > X_fit] <- X_fit[P_fit > X_fit]

  # save new global best
  if(any(P_fit < p_g_fit)){
    p_g <- P[, which.min(P_fit)]
    p_g_fit <- min(P_fit)
  }
}

The global minimum is located at $[0, 0]$ which has a fitness of $0$ and the best position found from the PSO after $100$ iterations is located at r paste0("[",paste0(round(p_g, 12), collapse=", "), "]") and has a fitness of r round(p_g_fit, 12).

# load default PSO
source("R/PSO_functions.R")

Animation 2-Dimensional

This section provides insights into the behavior of the PSO by visualizing multiple iterations in a GIF. The GIF works in Adobe Acrobat DC or in the Markdown/HTML version of this thesis. The animation template and the objective function is inspired by [@Rtic2021]. The PSO core from the above chapter was used to complete the pso() function and is tested here with seed 0. The objective is to minimize the following function ($f:\mathbb{R}^2 \rightarrow \mathbb{R}$):

$$ f(x, y) = -20\cdot e^{-0.2 \cdot \sqrt{0.5 \cdot ((x-1)^2 + (y-1)^2)}} \ - e^{0.5 \cdot ( cos(2\cdot \pi \cdot x) + cos(2\cdot \pi \cdot y))} + e + 20. $$

The following code runs the PSO and tries to minimize the objective function:

set.seed(0)

f <- function(pos){
  -20 * exp(-0.2 * sqrt(0.5 *((pos[1]-1)^2 + (pos[2]-1)^2))) - 
  exp(0.5*(cos(2*pi*pos[1]) + cos(2*pi*pos[2]))) + 
  exp(1) + 20
}

res <- pso(
  par = rep(NA, 2),
  fn = f,
  lower = -10,
  upper = 10,
  control = list(
    s = 10,
    maxiter = 30,
    w0 = 1.2,
    save_traces = T
  )
)

The function f has many local minima and a global minimum at $(1,1)$ with the value $0$. The background color scale ranges from 0 as purple to 20 as yellow. The PSO has 10 particles, iterated 30 times with an inertia weight decreasing from 0.8 to 0. The iterations are visualized in the following GIF:

grid <- setNames(expand.grid(seq(-10, 10, length.out = 100), seq(-10, 10, length.out = 100)), c("x1", "x2"))

grid$fitness <- apply(grid, 1, f)

grid <- grid %>% spread(., key = x1, value = fitness) %>% column_to_rownames("x2") %>% as.matrix()





plot_ly(z = grid, type = "contour", y = as.numeric(rownames(grid)), x = as.numeric(colnames(grid)), showscale = F) %>%
  layout(
    yaxis = list(showticklabels=FALSE, tickvals="", zeroline = FALSE),
    xaxis = list(showticklabels=FALSE, tickvals="", zeroline = FALSE),
    margin=list(r=0, b=0, l=0, t=0, pad=0)
  ) %>%
  config(displayModeBar = FALSE) %>%
  html_save(., force=T, zoom = 1, vheight = NULL, vwidth = NULL, selector = ".cartesianlayer")

image_file <- "docs/p.png"
txt <- RCurl::base64Encode(readBin(image_file, "raw", file.info(image_file)[1, "size"]), "txt")


fig <- plot_ly() %>%
      add_trace(
        data=res$trace_data,
        x=~X1,
        y=~X2,
        frame = ~iter,
        name = "",
        mode ='markers',
        type = 'scatter',
        showlegend=F,
        marker = list(color = 'red', size=9, opacity = 1, showscale = F)
      ) %>%
      animation_opts(redraw=F, easing="cubic-in-out", transition = 500, frame = 700) %>%
      layout(
        xaxis = list(title = "axis_1", gridcolor = '#0000', zerolinewidth = 0, 
                     zeroline = FALSE, zerolinecolor = '#0000', range=c(min(as.numeric(colnames(grid))), max(as.numeric(colnames(grid))))),
        yaxis = list(title = "axis_2", gridcolor = '#0000', zerolinewidth = 0, 
                     zeroline = FALSE, zerolinecolor = '#0000', range=c(min(as.numeric(rownames(grid))), max(as.numeric(rownames(grid))))),
        images = list(
          list(
            # Add images
            source =  paste('data:image/png;base64', txt, sep=','),
            xref = "x",
            yref = "y",
            y = min(as.numeric(rownames(grid))),
            x = min(as.numeric(colnames(grid))),
            sizey = max(as.numeric(rownames(grid)))-min(as.numeric(rownames(grid))),
            sizex = max(as.numeric(colnames(grid)))-min(as.numeric(colnames(grid))),
            sizing = "stretch",
            xanchor="left",
            yanchor="bottom",
            opacity = 0.8,
            layer = "below"
          )
        )
      ) %>%
      config(displayModeBar = FALSE)
fig

grid <- setNames(expand.grid(seq(-10, 10, length.out = 100), seq(-10, 10, length.out = 100)), c("x1", "x2"))

grid$fitness <- apply(grid, 1, f)

grid <- grid %>% spread(., key = x1, value = fitness) %>% column_to_rownames("x2") %>% as.matrix()





plot_ly(z = grid, type = "contour", y = as.numeric(rownames(grid)), x = as.numeric(colnames(grid)), showscale = F) %>%
  layout(
    yaxis = list(showticklabels=FALSE, tickvals="", zeroline = FALSE),
    xaxis = list(showticklabels=FALSE, tickvals="", zeroline = FALSE),
    margin=list(r=0, b=0, l=0, t=0, pad=0)
  ) %>%
  config(displayModeBar = FALSE) %>%
  html_save(., force=T, zoom = 1, vheight = NULL, vwidth = NULL, selector = ".cartesianlayer")

image_file <- "docs/p.png"
txt <- RCurl::base64Encode(readBin(image_file, "raw", file.info(image_file)[1, "size"]), "txt")

trace_data <- res$trace_data %>% mutate(iter = iter+10) 
for(i in 1:10){
  trace_data <- rbind(trace_data, trace_data %>% filter(iter==11) %>% mutate(iter=i))
}
trace_data <- trace_data %>% arrange(iter)

temp <- sapply(list.files('gifs/pso_2dim_v2/', full.names = T), file.remove)
for(i in 1:length(unique(trace_data$iter)) ){
  plot_ly() %>%
      add_trace(
        data=trace_data[trace_data$iter==i,],
        x=~X1,
        y=~X2,
        name = "",
        mode ='markers',
        type = 'scatter',
        showlegend=F,
        marker = list(color = 'red', size=9, opacity = 1, showscale = F)
      ) %>%
      layout(
        xaxis = list(title = "axis_1", gridcolor = '#0000', zerolinewidth = 0, tickfont=list(size=18), titlefont=list(size=18), 
                     zeroline = FALSE, zerolinecolor = '#0000', range=c(min(as.numeric(colnames(grid))), max(as.numeric(colnames(grid))))),
        yaxis = list(title = "axis_2", gridcolor = '#0000', zerolinewidth = 0, tickfont=list(size=18), titlefont=list(size=18),
                     zeroline = FALSE, zerolinecolor = '#0000', range=c(min(as.numeric(rownames(grid))), max(as.numeric(rownames(grid))))),
        images = list(
          list(
            # Add images
            source =  paste('data:image/png;base64', txt, sep=','),
            xref = "x",
            yref = "y",
            y = min(as.numeric(rownames(grid))),
            x = min(as.numeric(colnames(grid))),
            sizey = max(as.numeric(rownames(grid)))-min(as.numeric(rownames(grid))),
            sizex = max(as.numeric(colnames(grid)))-min(as.numeric(colnames(grid))),
            sizing = "stretch",
            xanchor="left",
            yanchor="bottom",
            opacity = 0.8,
            layer = "below"
          )
        )
      ) %>%
      config(displayModeBar = FALSE) %>% 
      html_save(., force=T, vheight = 600, vwidth = 800, delay = 1)
  file.copy(from = "docs/p.png", to = paste0("gifs/pso_2dim_v2/gif_2dim_", i,".png"))
}

```{=latex} \begin{center} \animategraphics[loop, width=10cm]{5}{./gifs/pso_2dim_v2/gif_2dim_}{1}{40} \begin{figure}[!h] \caption{Visualization of the behavior of the standard PSO in a GIF.} \end{figure} \end{center}

## Animation 2-Dimensional App
To gain an even better understanding of the behavior of a PSO, a WebApp was developed that allows the user to minimize any two-dimensional functions with constraints. Three variants of the PSO were implemented, which will be analyzed in detail in the later chapters of this thesis. The app can also be used to study the effect of hyperparameters on the behavior of the PSO in more detail. Within the app it is possible to display each step of the iterations and this even with all the direction vectors that generate the resulting motion of each particle. The app is hosted at [@PSOApp] and the code for it is also freely available at [@GitPSO]. The hosted app may be used for educational purposes and using, copying, modifying, and sharing the code is permitted without restrictions.


## Simple Constraint Handling
The simplest method for dealing with constraints is the penalty method, which takes into account the intensity of constraint breaks by increasing the objective value of a minimization problem. The two common problems studied in the last two chapters are quadratic problems with the same structure of constraints. This can be used to create a generic constraint handling function for these particular QP's. Both problems has to satisfy the following equation:

$$
  A^T \times x  \geq b_0
$$

To calculate a value for the intensity of constraint breaks, the above inequality is subtracted by $b_0$ and the left-hand side is defined as follows:

$$
  c := A^T \times x - b_0.
$$

All negative elements in the vector $c$ represent constraint breaks that are squared and summed to extract a value that describes the intensity of constraint breaks like follows:

$$
  c_{break} = \sum p(c_i)^2
$$

with

$$
 p(x) =   \begin{cases}
  0 &\text{ if }x \geq 0\\
  x &\text{ if }x < 0.
  \end{cases}
$$

By following the name conventions of `solve.QP`, a list named `mat` is created in the parent environment, that contains the necessary inputs. The generic R function to calculate the constraint breaks can be defined as follows:


```r
calc_const <- function(x){
  const <- t(mat$Amat) %*% x - mat$bvec
  sum(pmin(0, const)^2)
}

In contrast to the solve.QP, it's difficult for the PSO to find a feasible point, if equality constraints are used, which is why the equality constraint $\textstyle\sum w_i = 1$ is reduced to $0.99 \leq \textstyle\sum w_i$ and $\textstyle\sum w_i \leq 1$.

The new objective function fn() consists of two parts. The first part is to evaluate the unconstrained objective of the QP with the following function:

calc_fit <- function(x){
  0.5 * t(x) %*% mat$Dmat %*% x - t(mat$dvec) %*% x
}

The second part is the function calc_const(). Since breaking constraints is much worse than losing fitness, it must have a higher intensity (e.g. 10) which must be fine-tuned. This results in the final fn() function composition:

fn <- function(x){
  fitness <- calc_fit(x)
  constraints <- calc_const(x)
  return(fitness + 10 * constraints)
}

This approach to handle constraints is called the penalization method and is definitely the most straightforward approach. Its disadvantage is the fact that the PSO has to find a balance between the violation of constraints and the goal. As explained in [@InSi2008], there are three other constraint handling methods, but the results show that none of them is superior. The treatment of constraints should be chosen appropriately for the given problem. For example, it may be useful to use the feasibility preservation technique to obtain a solution that is guaranteed not to break any constraints. The disadvantages here are longer computation time and less exploration of particles, since only feasible solutions can be stored as personal or global best solutions.

Convergence Analysis of the Standard PSO

This section summarizes the convergence analysis of the standard PSO with constant inertia weight $w$ performed in [@FbEn2010] and in [@FbEn2007], whose main goal is to prove whether the PSO converges to a local or even a global minimum. The results from the sources are presented in a simplified form so that the full scope can be understood. No mathematical foundations are proven, as this would otherwise be beyond the scope of this thesis. It is recommended to read the detailed proofs in the referenced sources to gain a deeper understanding. The structure of this section begins by gathering conditions that have to be met for a general stochastic search algorithm to be classified as a local or global search algorithm. Then, these conditions are used to classify the standard PSO with constant inertia weight $w$. It follows that certain conditions have to be imposed on the hyperparameters and the PSO has to be modified to be classified as a global or local search algorithm. Finally, these findings are applied to the practical implementation of PSO to derive relevant conclusions.

Convergence of a General Stochastic Search Algorithm

We start by defining a general problem where a measurable function $f: \mathbb{R}^n\rightarrow\mathbb{R}$ has to be minimized on the measurable search space $S \subseteq \mathbb{R}^n$. To do so, we use a procedure that starts with a position generated from the search space $S$ and in each iteration compares its old position with a new one and keeps one of the two. More precisely, we assume that in each iteration of the procedure a random vector $\xi$ is selected according to a probability measure $\mu_i$ (for iteration $i$) and then compared to the current position. This comparison of positions and the return of one of them is denoted by the mapping $D$. For the procedure to be classified as an algorithm, it is sufficient to show that the following algorithm condition is satisfied:

Algorithm condition: The mapping $D: S \times \mathbb{R}^n\rightarrow S$ should satisfy $f(D(z, \xi)) \leq f(z)$ and if $\xi \in S$, then $f(D(z, \xi)) \leq f(\xi)$.

Thus, the procedure is an algorithm if after each iteration the position is returned that has the lower fitness value and is also in the search space $S$.

Next, we need to adapt the definition of the global minimum to a set based perspective. Since the probability of generating an exact point in a search space $S$ with a stochastic algorithm is zero, we define an optimality region $R_{\epsilon}$ that has a non-zero volume:

$$ R_{\epsilon} = {z\in S| f(z) < \psi + \epsilon} $$

where $\psi$ denotes the essential infimum of $f$ on $S$, with $\epsilon > 0$. Here the essential infimum of $f$ on $S$ is defined as

$$ \psi := \text{ess inf} (f) := \inf { b \in R \mid L ({ x \in S \mid f(x) < b}) > 0 } $$

and $L$ denotes the Lebesgue measure.

We are now interested in the conditions under which the algorithm is guaranteed to converge to the optimality region. For this purpose, we first consider the simpler case of global convergence, for which the following convergence condition has to be satisfied:

Convergence condition (for global search): Sufficient condition for convergence to a global minimum: For any (Borel) subset $A$ of $S$ with a positive Lebesgue measure $L(A)>0$,

$$ \prod_{i=0}^\infty (1-\mu_i(A)) = 0, $$ where $\mu_i(A)$ is the probability of the algorithm, in iteration $i$, to generate a new point in $A$.

This means that it is sufficient to show that the generation of new positions in the algorithm have a probability of zero to miss any subset of $S$ with a positive volume infinitely often. For example, an algorithm that generates new positions uniformly distributed on $S$ would satisfy this condition.

In summary, an algorithm that satisfies the algorithm condition and the convergence condition (for global search) is guaranteed to converge to the optimality region, which is stated in the following theorem:

::: {.theorem name="Global search algorithm"} Assume that $f$ is a measurable function, $S$ is a measurable subset of $\mathbb{R}^n$ and both the algorithm condition and the convergence condition (for global search) are satisfied. If ${x_i}^{\infty}_{i=0}$ is a sequence generated by the algorithm, D, then

$$ \lim_{i\rightarrow \infty} \ P(x_i \in R_{\epsilon}) = 1 $$

:::

Now we address the local convergence of the algorithm, which in addition to the algorithm condition has to satisfy the following condition:

Convergence condition (for local search): Sufficient condition for convergence to at least a local minimum: The function $f$ is unimodal on a compact set $S$ and for any $x_i\in S$ there exists a $\gamma > 0$ and an $0<\eta\leq1$ such that

$$ \mu_i(\text{dist}(x_{i+1}, R_{\epsilon}) \leq \text{dist}(x_{i}, R_{\epsilon})-\gamma \ \text{ or } \ x_i \in R_{\epsilon} ) \geq \eta $$

for all probability measures $\mu_i$ and a distance measure $\text{dist}()$ where $x_{i+1} = D(x_i,\xi)$.

This means that the algorithm has to be able to get closer to the optimality region by at least $\gamma$ in each iteration with a non-zero probability, and the fitness function $f$ on a compact set $S$ has to be unimodal, which means that it has exactly one minimum, which is also the global minimum. Later, we will explain why this condition, extended to a larger set of functions, leads to convergence to a local minimum for the PSO.

Thus, for a stochastic search algorithm that satisfies the algorithm condition and the convergence condition (for local search), the following theorem applies:

::: {.theorem name="Local search algorithm"} Assume that $f$ is a measurable function, $S$ is a measurable subset of $\mathbb{R}^n$ and both the algorithm condition and the convergence condition (for local search) are satisfied. Let ${x_i}^{\infty}_{i=0}$ be a sequence generated by the algorithm, D. Then,

$$ \lim_{i\rightarrow \infty} \ P(x_i \in R_{\epsilon}) = 1 $$

:::

It is difficult to prove that an algorithm satisfies the convergence condition (for local search), so one may ask why this is important at all. The biggest advantage of local search algorithms is, that they usually converge much faster than global search algorithms. Even if it is in local minima, restarting can often increase the chance of finding the global minimum at some point. In case it is sufficient to find a good local minimum, a local search algorithm is preferable in most cases.

Convergence of the PSO

First of all it has to be ensured that the PSO satisfies the algorithm condition. That is, the algorithm condition is satisfied if the algorithm has a mapping $D$ that compares two positions and returns the position with the lowest objective value that is also in the domain of definition. In terms of the PSO, we are concerned with the behavior when the global best position is changed. If a new personal best position $\xi$ is found, it is guaranteed to be in the search space, since all calculated positions are pushed back to their edge if they are outside the search space. Then, the PSO compares the objective value of $\xi$ with the objective value of the previous global best position $z$ and stores the position with the lowest objective value as the global best position. Thus, the PSO satisfies the algorithm condition.

Before we can begin to analyze whether the PSO converges to a local or global minimum, conditions has to be placed on the hyperparameters to ensure that the entire swarm converges to a fixed point. In [@FbEn2010], results were referenced from [@RiPo2007], which analyzed the convergence property of a deterministic PSO, i.e., when the uniformly random numbers are replaced by their upper bounds. Subsequently, the results were verified by simulations of the stochastic standard PSO. This work was subsequently extended by [@Mjly2006] to deal directly with the stochastic version of the standard PSO.

In [@Mjly2006], the trajectory of each particle $d$ is described as a sequence of random variables ${\pmb{X}^d_i}$, with particle positions $\pmb{X}^d_i$ in iteration $i$. This is used to determine the following three properties that govern the convergence behavior of the PSO: Property 1 is satisfied if the expected value of particle positions $\pmb{X}^d_i$ converges to a fixed point for each particle, which can be summarized as $\lim_{i\rightarrow \infty} E[\pmb{X}^d_i] = p^d \ \forall d$. Property 2 is satisfied if the variance of the positions for each particle tends to zero, which can be summarized as $\lim_{i\rightarrow \infty} Var[\pmb{X}^d_i] = 0 \ \forall d$. Property 3 is satisfied if the entire particle swarm converges to a fixed point in the mean square, i.e., all particle positions converge to a single point $\hat{p}$, which can be summarized as $\lim_{i\rightarrow \infty} E[(\pmb{X}_i^d-\hat{p})^2]=0 \ \forall d$.

Properties 1-3 depend on the choices of $c_p$, $c_g$ and $w$, which can be found in [@Mjly2006]. The proofs and deeper explanations of the constraints on parameters are beyond the scope of this thesis. In the following, the regions for each property are visualized and the blue region is denoted as the convergent parameter region with bounds $c_p$=$c_g$=$c>0$ and $0<w<1$:

df <- expand.grid(w=seq(from=0, to=1, length.out=1000), c=seq(from=0, to=4, length.out=1000))

df <- df %>% 
  mutate(
    black = 
      c > 0 &
      w > 0 &
      2*c < 4*(1+w),
    blue = 
       w > 0 & 
       w < 1 &
       c > 0 &
       (2*c)<(4*(1+w)) &  
       (-2*c*w^2+(2/6*c^2+1/2*c^2)*w+2*c-2/3*c^2-1/2*c^2) > 0,
    dark_blue = 
       w > 0 & 
       w < 1 &
       c > 0 &
       (2*c)<(4*(1+w)) &  
       (-2*c*w^2+(2/6*c^2+1/2*c^2)*w+2*c-2/3*c^2-1/2*c^2) > 0 &
       (-2*c*w^2+(2/6*c^2+1/2*c^2)*w+2*c-2/3*c^2-1/2*c^2) < (c^2*(1+w)/6),

  )

grid <- df %>% select(black, c, w) %>% spread(., key = w, value = black) %>% column_to_rownames("c") %>% as.matrix() + df %>% select(blue, c, w) %>% spread(., key = w, value = blue) %>% column_to_rownames("c") %>% as.matrix() + df %>% select(dark_blue, c, w) %>% spread(., key = w, value = dark_blue) %>% column_to_rownames("c") %>% as.matrix()

plot_ly(z = grid*1, type = "heatmap", x = colnames(grid), y=rownames(grid), colors = colorRamp(c("white", "black", "cyan", "blue")), showscale = FALSE) %>% 
  layout(xaxis=list(title="w"), yaxis=list(title="c"), margin=list(r=60)) %>% 
  config(displayModeBar = FALSE) %>% 
  html_save()

In the HTML version of this thesis, the diagram can be displayed with additional information as a mouse-hover, which facilitates the precise classification of a parameter pair.

Furthermore, it is assumed that the choice of parameters $w$, $c_p$, and $c_g$ lies within the convergent parameter region, leading the PSO algorithm to converge to a point $\hat{p}$ with all particles. The point $\hat{p}$ is not necessarily a local minimum. To see this, we examine the convergence condition (for local search) for the PSO. This would mean that the PSO has to be able to reduce the distance to the optimality region in each iteration with a downward bounded non-zero probability. Unfortunately, this is not the case, as can be seen in a simple example. Suppose the PSO is to minimize a 2-dimensional objective function $f$ with only two particles and the initial positions are $x_1=(k_1, k_2)$, $x_2=(k_1, k_2) + a_1 \cdot (1, 0)$ and the initial velocities are $v_1=a_2 \cdot (1, 0)$ and $v_2=a_3 \cdot (1, 0)$ with constants $k_1, k_2, a_1, a_2$, and $a_3$. It follows that all particles and saved positions have $k_2$ in the second coordinate, resulting in a zero in the second coordinate of the velocity update in all iterations. Therefore, the second coordinate with $k_2$ is not guaranteed to lead to the optimality region and the PSO cannot improve in this coordinate, which contradicts with the convergence condition (for local search) and it follows that the PSO is not a local search algorithm.

The possibility of this premature stagnation in a coordinate decreases significantly with increasing particle number, which is why it is very unlikely in practice. In order for the PSO to still be classified as a local search algorithm, a variant of the PSO was developed in [@FbEn2010], called the guaranteed convergence PSO (GCPSO). In this variant, it is guaranteed that it is at least possible to randomly shift the position of the global best particle in each coordinate by a uniformly generated value from the interval $[-\rho_i, \rho_i]$ in iteration $i$. Therefore, the GCPSO does not stagnate in the above example and it is possible to prove that the convergence condition (for local search) is satisfied. The exact implementation of the value $\rho_i$ and how it is changed per iteration is quite technical and is not exactly reproduced here. Further, it can be imagined that $\rho_0$ is chosen large enough and doubles per iteration if the global best position is successfully changed and halves if the global best particle is not changed. This behavior of the GCPSO is necessary to explain why the GCPSO can converge to a local minimum for multimodal functions.

Assume that the function $f$ is multimodal and therefore has several local minima on $S$. But it is possible to divide the search space $S$ and all $k$ minima into compact subsets $\hat{S}k \subseteq S$ with $\hat{S}_j \cap \hat{S}_t = \emptyset \ \forall j\neq t$ on which $f$ is unimodal and therefore has exactly one minimum. It follows that if the GCPSO with hyperparameters chosen from the convergent parameter region is applied to this problem, it will initially jump around between the subsets $\hat{S}_k$. However, due to the choice of parameters, the swarm will eventually contract and converge towards the global best particle. From this point on, the GCPSO stagnates, which is why the $\rho_i$ becomes smaller and smaller. After a certain iteration, it is no longer possible for the GCPSO to leave the current subset $\hat{S}{k^}$. Subsequently, all requirements for the convergence condition (for local search) in the respective subset $\hat{S}_{k^}$ hold, which is why the swarm converges with probability 1 towards the minimum in this subset. It cannot be guaranteed in which subset $\hat{S}_k$ this happens, so it can only be said that the GCPSO converges to at least a local minimum if the function $f$ can be divided into compact subsets on which $f$ is unimodal.

It is clear that the standard PSO and also the GCPSO do not satisfy the convergence condition (for global search), since it cannot be guaranteed that every region in $S$ is searched. However, this can be enforced by rather trivial changes to the algorithm. For example, by introducing a particle that is randomly generated every $k$ iterations. Another approach is to restart the PSO when the majority of particles are in a small region. Other approaches and their advantages are discussed in [@FbEn2010].

In this theoretical approach, the behavior of the PSO is evaluated as the number of iterations approaches infinity. In practical applications, PSO is frequently utilized to tackle complex problems and a local minimum with an acceptable objective value is deemed sufficient as speed is often prioritized. To expedite convergence to minima, the decreasing inertia weight $w$ is employed. To enhance the likelihood of finding the global optimum or an acceptable local minimum, the PSO is restarted after a predetermined maximum number of iterations. This process is repeatedly carried out until a specified time or number of restarts has been reached.

Example MVP

This example uses the solve.QP approach from section \@ref(exampleanalyticalmvp) with ten assets as the benchmark. Briefly, the goal is to create an MVP from ten of the largest U.S. stocks between 2018-01-01 and 2019-12-31 for each possible $\lambda$. The constraints are long only and the weights should satisfy $0.99 \leq \textstyle\sum w_i \leq 1$. The PSO has 300 particles and 200 iterations for each lambda. The starting position is the equally weighted vector $v$ with $\textstyle\sum v_i=1$, the inertia weight starts at $1.2$ and decreases to zero and the coefficients are $c_p=c_g=0.5$. The main characteristics of all portfolios created with the solve.QP compared to the PSO are shown below:

\vspace{-0,1cm}

set.seed(0)

returns_raw <- buffer(
  get_yf(
    tickers = c("IBM", "GOOG", "AAPL", "MSFT", "AMZN", 
                "NVDA", "JPM", "META", "V", "WMT"), 
    from = "2018-01-01", 
    to = "2019-12-31"
  )$returns, 
  "AS_10_assets"
)

# re-arrange: low var first
vars <- sapply(returns_raw, var)
returns_raw <- returns_raw[, order(vars, decreasing = F)]

mvp_QP <- function(returns, lambda){
  tc <- tryCatch({
    mu <- ret_to_geomeanret(returns)

    cov <- as.matrix(nearPD(cov(returns))$mat)

    mat <- list(
      Dmat = lambda * cov,
      dvec = (1-lambda) * mu,
      Amat = t(rbind(
        -rep(1, ncol(returns)), # sum w <= 1
        rep(1, ncol(returns)), # sum w >= 0.99
        diag(1, nrow=ncol(returns), ncol=ncol(returns)) # long only
      )),
      bvec = c(
        -1, # sum w <= 1
        0.99, # sum w >= 0.99
        rep(0, ncol(returns)) # long only
      ),
      meq = 0
    )

    qp <- solve.QP(
      Dmat = mat$Dmat, dvec = mat$dvec, 
      Amat = mat$Amat, bvec = mat$bvec, meq = mat$meq
    )

    res <- list(
      "mu" = mu %*% qp$solution,
      "var" = t(qp$solution) %*% cov %*% qp$solution,
      "composition" = setNames(qp$solution, colnames(returns))
    )
    TRUE
  }, error = function(e){FALSE})


  if(tc){
    return(res)
  }else{
    return(list(
      "mu" = NA,
      "var" = NA,
      "composition" = NA
    ))
  }
}


mvp_PSO <- function(returns, lambda, silent = T){
  tc <- tryCatch({
    mu <- ret_to_geomeanret(returns)

    cov <- cov(returns)

    mat <- list(
      Dmat = lambda * cov,
      dvec = (1-lambda) * mu,
      Amat = t(rbind(
        -rep(1, ncol(returns)), # sum w <= 1
        rep(1, ncol(returns)), # sum w >= 0.99
        diag(1, nrow=ncol(returns), ncol=ncol(returns)) # long only
      )),
      bvec = c(
        -1, # sum w <= 1
        0.99, # sum w >= 0.99
        rep(0, ncol(returns)) # long only
      ),
      meq = 0
    )

    calc_fit <- function(x){
      0.5 * t(x) %*% mat$Dmat %*% x - t(mat$dvec) %*% x
    }
    calc_const <- function(x){
      const <- t(mat$Amat) %*% x - mat$bvec
      sum(pmin(0, const)^2)
    }
    pso_res <- pso(
      par = rep(1/ncol(returns), ncol(returns)),
      fn = function(x){
        fitness <- calc_fit(x)
        constraints <- calc_const(x)
        return(fitness + 10 * constraints)
      },
      lower = 0,
      upper = 1,
      control = list(
        s = 300, # swarm size
        c.p = 0.5, # inherit best
        c.g = 0.5, # global best
        maxiter = 200, # iterations
        w0 = 1.2, # starting inertia weight
        wN = 0, # ending inertia weight
        save_traces = F, # save more information
        save_fit = F
      )
    )
    if(!silent){
      p0("constraint: ", sum(calc_const(pso_res$solution)))
      p0("fitness: ", calc_fit(pso_res$solution))
    }

    res <- list(
      "mu" = mu %*% pso_res$solution,
      "var" = t(pso_res$solution) %*% cov %*% pso_res$solution,
      "composition" = setNames(pso_res$solution, colnames(returns)),
      "fit" = calc_fit(pso_res$solution),
      "constraint" = sum(calc_const(pso_res$solution))
    )
    TRUE
  }, error = function(e){FALSE})


  if(tc){
    return(res)
  }else{
    return(list(
      "mu" = NA,
      "var" = NA,
      "composition" = NA
    ))
  }
}


df <- NULL
runs <- 100
for(i in 0:round((runs-1)*0.4)){
  lambda <- 1-i/runs
  temp <- mvp_QP(returns = returns_raw, lambda = lambda)
  df <- rbind(df, data.frame("type"="QP_MVP", "lambda"=lambda, "mu"=temp$mu, "var"=temp$var, "constraint"=0))

  temp <- mvp_PSO(returns = returns_raw, lambda = lambda)
  df <- rbind(df, data.frame("type"="PSO_MVP", "lambda"=lambda, "mu"=temp$mu, "var"=temp$var, "constraint"=temp$constraint))
}

df$sd = sqrt(df$var)



df_qp <- df[df$type=="QP_MVP" & df$lambda %in% seq(1,0.8,-0.1),]
df_diff <- data.frame(
  "mu_pso" = df[df$type=="PSO_MVP",]$mu,
  "sd_pso" = df[df$type=="PSO_MVP",]$sd,
  "mu_qp" = df[df$type=="QP_MVP",]$mu,
  "sd_qp" = df[df$type=="QP_MVP",]$sd
)

shapes <- list()
for(i in 1:nrow(df_diff)){
  shapes[[i]] <- list(
    type = "line", 
    y0 = df_diff[i,]$mu_pso, 
    y1 = df_diff[i,]$mu_qp, 
    yref = "y",
    xref = "x",
    x0 = df_diff[i,]$sd_pso, 
    x1 = df_diff[i,]$sd_qp, 
    line = list(color = "lightgrey"),
    layer='below'
  )
}


plot_ly(
  data = df, 
  x=~sd, 
  y=~mu, 
  name=~type, 
  mode="markers", 
  type = 'scatter', 
  color = ~type, 
  colors = c("green", "red")
  ) %>% 
  add_annotations(
    data=df_qp,   
    x=~sd, 
    y=~mu, 
    text = ~paste0("lambda: ",lambda),
    ay = -30,
    ax = -30
  ) %>% 
  layout(
    xaxis=list(title = "sigma", range=c(0.5*min(df$sd), 1.2*max(df[df$type=="QP_MVP",]$sd)), showgrid = FALSE),
    yaxis=list(range=c(0.5*min(df$mu), 1.2*max(df[df$type=="QP_MVP",]$mu)), showgrid = FALSE),
    shapes = shapes,
    legend = list(x = 0.1, y = 0.9)
  ) %>% 
  html_save(., expand = c(-25,0,20,0))

The corresponding portfolios for each $\lambda$ are connected with a grey line to visualize the error of the PSO. It turns out that it is possible to solve MVP problems with a PSO approach. It is noticable that some PSO runs were not able to reach the global minimum and thus show a deviation from the solve.QP approach, which can often be fixed by repeated runs.

Example ITP-MSTE

A similar ITP-MSTE solved with solve.QP in section \@ref(exampleitpsolveqp) is used as a benchmark for the PSO. In summary, the goal is to create a portfolio that minimizes the mean square error of the returns of itself and the SP500TR between 2018-01-01 and 2019-12-31. The pool of assets includes all assets that are present in 2019-12-31 and have no missing values. The constraints are long only and the weights should satisfy $0.97 \leq \textstyle\sum w_i \leq 1$. The parameters for the PSO are a swarm size of 200, 300 iterations, the inertia weight starts at $1.2$ and decreases to zero, the coefficients are $c_p=c_g=0.5$, the maximum weight of each asset is $5\%$ and the starting position is the equally weighted vector $v$ with $\textstyle\sum v_i=1$. The PSO was run ten times, and the aggregated best and mean runs are compared to the solve.QP approach for seed 0 in the table below:

set.seed(0)

from <- "2018-01-01"
to <- "2019-12-31"

spx_composition <- buffer(
  get_spx_composition(),
  "AS_spx_composition"
)


pool_returns_raw <- buffer(
  get_yf(
    tickers = spx_composition %>% 
      filter(Date<=to) %>% 
      filter(Date==max(Date)) %>% 
      pull(Ticker), 
    from = from, 
    to = to
  )$returns, 
  "AS_sp500_assets"
)
pool_returns_raw <- 
  pool_returns_raw[, colSums(is.na(pool_returns_raw))==0]


bm_returns <- buffer(
  get_yf(tickers = "^SP500TR", from = from, to = to)$returns, 
  "AS_sp500tr"
) %>% setNames(., "SP500TR")




itp_QP <- function(pool_returns, bm_returns){

  mat <- list(
    Dmat = t(pool_returns) %*% pool_returns,
    dvec = t(pool_returns) %*% bm_returns,
    Amat = t(rbind(
      -rep(1, ncol(pool_returns)), # sum w <= 1
      rep(1, ncol(pool_returns)), # sum w >= 0.99
      diag(1, 
           nrow=ncol(pool_returns), 
           ncol=ncol(pool_returns)), # long only
      diag(-1, 
           nrow=ncol(pool_returns), 
           ncol=ncol(pool_returns)) # w <= 0.05
    )),
    bvec = c(
      -1, # sum w <= 1
      0.97, # sum w >= 0.99
      rep(0, ncol(pool_returns)), # long only
      rep(-0.05, ncol(pool_returns)) # w <= 0.05
    ),
    meq = 0
  )

  qp <- solve.QP(
    Dmat = mat$Dmat, dvec = mat$dvec, 
    Amat = mat$Amat, bvec = mat$bvec, meq = mat$meq
  )


  res <- list(
    "value" = as.numeric(0.5 * t(qp$solution) %*% mat$Dmat %*% qp$solution - t(mat$dvec) %*% qp$solution),
    "var" = as.numeric(
      var(pool_returns %*% qp$solution - bm_returns)),
    "solution" = setNames(qp$solution, colnames(pool_returns))
  )
}


itp_PSO <- function(pool_returns, bm_returns, silent = T){
  mat <- list(
    Dmat = t(pool_returns) %*% pool_returns,
    dvec = t(pool_returns) %*% bm_returns,
    Amat = t(rbind(
      -rep(1, ncol(pool_returns)), # sum w <= 1
      rep(1, ncol(pool_returns)), # sum w >= 0.99
      diag(1, 
           nrow=ncol(pool_returns), 
           ncol=ncol(pool_returns)) # long only
    )),
    bvec = c(
      -1, # sum w <= 1
      0.97, # sum w >= 0.99
      rep(0, ncol(pool_returns)) # long only
    ),
    meq = 0
  )

  calc_fit <- function(x){
    as.numeric(0.5 * t(x) %*% mat$Dmat %*% x - t(mat$dvec) %*% x)
  }
  calc_const <- function(x){
    const <- t(mat$Amat) %*% x - mat$bvec
    sum(pmin(0, const)^2)
  }
  time_it <- system.time({
    pso_res <- pso(
      par = rep(1/ncol(pool_returns), ncol(pool_returns)),
      fn = function(x){
        fitness <- calc_fit(x)
        constraints <- calc_const(x)
        return(fitness+1000*constraints)
      },
      lower = 0,
      upper = 0.05,
      control = list(
        s = 200, # swarm size
        c.p = 0.5, # inherit best
        c.g = 0.5, # global best
        maxiter = 300, # iterations
        w0 = 1.2, # starting inertia weight
        wN = 0, # ending inertia weight
        save_traces = F # save more information
      )
    )
  })

  if(!silent){
    p0("constraint: ", sum(calc_const(pso_res$solution)))
    p0("fitness: ", calc_fit(pso_res$solution))
  }

  res <- list(
    "composition" = setNames(pso_res$solution, colnames(pool_returns)),
    "fit" = calc_fit(pso_res$solution),
    "constraint" = calc_const(pso_res$solution),
    "var" = as.numeric(
      var(pool_returns %*% pso_res$solution - bm_returns)),
    "time" = time_it[3]
  )

  return(res)
}


df <- NULL
time <- system.time(
  temp_qp <- itp_QP(pool_returns_raw, bm_returns)
)
df <- data.frame("type"="ITP-MSTE_QP", "var"=temp_qp$var, "fitness"=temp_qp$value, "constraint"=0, "time"=time[3])
for(i in 1:10){
  time <- system.time(
    temp_pso <- itp_PSO(pool_returns = pool_returns_raw, bm_returns)
  )
  df <- rbind(
    df, 
    data.frame(
      "type"="ITP-MSTE_PSO", 
      "var"=temp_pso$var, 
      "fitness"=temp_pso$fit, 
      "constraint"=temp_pso$constraint, 
      "time"=temp_pso$time
    )
  )


}
df$sd <- sqrt(df$var)

df_summary <- rbind(
  df %>% filter(type == "ITP-MSTE_QP"),
  df %>% filter(type == "ITP-MSTE_PSO") %>% filter(fitness == min(fitness)) %>% mutate(type = "ITP-MSTE_PSO_best"),
  df %>% filter(type == "ITP-MSTE_PSO") %>% group_by(type) %>% summarise_all(., mean) %>% mutate(type = "ITP-MSTE_PSO_mean")
)
rownames(df_summary) <- NULL
df_summary$var <- round(df_summary$var, 12)
df_summary$sd <- round(df_summary$sd, 6)
df_summary$fitness <- round(df_summary$fitness, 7)
df_summary$constraint <- round(df_summary$constraint, 18)
df_summary$time <- round(df_summary$time, 1)

reactable(
  df_summary %>% select(type, sd, var, fitness, constraint, time),
  wrap = FALSE,
  compact = T,
  columns = list(
    type = colDef(width=150),
    var = colDef(format = colFormat(digits=9), show =F),
    sd = colDef(format = colFormat(digits=5), width=75),
    constraint = colDef(name = "constraint break", format = colFormat(digits=9), width=120),
    time = colDef(width=65)
  ),
  style = list(fontFamily="Computer Modern, sans-serif", fontSize=13),
  defaultColDef = colDef(align="left")
) %>% 
  html_save(., expand = c(-40,-20,-35,-20))

It can be seen that in all PSO runs, sufficient fitness was achieved with negligible constraint breaks, but much more computation time was required.

Pros and Cons for Continuous Problems

A PSO approach has advantages and disadvantages, since on the one hand any problem can theoretically be solved, but it cannot be guaranteed that the solution is also optimal. In addition, the calculations take much longer than with the solve.QP approach, which raises the question why a PSO approach should have any benefit at all. This is exactly the case, if the solution of the problem is no longer possible by the solve.QP alone, as it is for example the case with mixed-integer-quadratic-problems. In this type of problems, the condition is that the variable of interest $x$ has to be an integer vector, which can only be solved continuously using solve.QP and then rounded. However, this rounding error can become arbitrarily large, which is why the chances of the PSO approach to achieve a better solution are greater than with the solve.QP approach.

Discrete Problems

A continuous solution for a portfolio is not sufficient for practical purposes, since usually only integer amounts of assets can be purchased. It's even worse if lot sizes are needed, because these can only be bought in minimum denomination of e.g. ten thousand. Lot sizes are often used in fixed income products. The biggest drawbacks of rounding a continuous solution are the disregarding of constraints and the difference in the objective value, which often can't reach the new optimum. A solution with broken conditions is not acceptable in practice and a solve.QP approach only produces one solution, which is why its insecure to hope for a sufficient solution after rounding. The PSO doesn't have these drawbacks and can be easily used for discrete problems by rounding the input of the objective function fn(). In a portfolio with net asset value (nav) consisting of only American stocks with weights $w_i$ and closing prices $p_i$ can be discretized to $w_i^d$ by the following formula:

$$ w_i^d =\text{round}(w_i \cdot \frac{\text{nav}}{p_i})\cdot \frac{p_i}{\text{nav}}. $$

Example Discrete ITP-MSTE

This example analyses the error of rounding a solution with the solve.QP approach and compares it to a discrete PSO. A second discrete PSO is added, that takes the continuous solution of the solve.QP and uses it as starting position of one particle. The ITP-MSTE focuses on replicating the SP500TR with its top 100 assets derived from the example with discarding in section \@ref(exampleitpsolveqp) and tries to construct a tracking portfolio with the constraints long only, $0.99 \leq \textstyle\sum w_i \leq 1$ and $\text{nav} = 10000$ in the time frame from 2018-01-01 to 2019-12-31. The prices used to discretize are the closing prices on 2019-12-31 and both PSO's have 100 particles, 300 iterations, coefficients $c_p=c_g=0.5$ and the inertia weight starts at $1.2$ and decreases to zero. The results can be observed in the table below:

set.seed(0)

nav <- 10000

from <- "2018-01-01"
to <- "2019-12-31"

spx_composition <- buffer(
  get_spx_composition(),
  "AS_spx_composition"
)


pool_data <- buffer(
  get_yf(
    tickers = spx_composition %>% 
      filter(Date<=to) %>% 
      filter(Date==max(Date)) %>% 
      pull(Ticker), 
    from = from, 
    to = to
  ), 
  "AS_sp500_asset_data"
)

load("data/assets_pool_100.rdata")

pool_data$returns <- pool_data$returns[, assets_pool_100]
pool_data$prices <- pool_data$prices[, assets_pool_100]


bm_returns <- buffer(
  get_yf(tickers = "^SP500TR", from = from, to = to)$returns, 
  "AS_sp500tr"
) %>% setNames(., "SP500TR")



pool_returns <- pool_data$returns
mat <- list(
  Dmat = t(pool_returns) %*% pool_returns,
  dvec = t(pool_returns) %*% bm_returns,
  Amat = t(rbind(
    -rep(1, ncol(pool_returns)), # sum w <= 1
    rep(1, ncol(pool_returns)), # sum w >= 0.99
    diag(1, 
         nrow=ncol(pool_returns), 
         ncol=ncol(pool_returns)) # long only
  )),
  bvec = c(
    -1, # sum w <= 1
    0.99, # sum w >= 0.99
    rep(0, ncol(pool_returns)) # long only
  ),
  meq = 0
)

calc_fit <- function(x){
  as.numeric(0.5 * t(x) %*% mat$Dmat %*% x - t(mat$dvec) %*% x)
}
calc_const <- function(x){
  const <- t(mat$Amat) %*% x - mat$bvec
  sum(pmin(0, const)^2)
}

# Default solve.QP
time_qp <- system.time({
  res_qp <- itp_QP(pool_data$returns, bm_returns)
  prices <- last(pool_data$prices)
  res_qp_discrete <- setNames(as.vector(round(res_qp$solution*nav/prices)*prices/nav), names(res_qp$solution))
})
res_qp_discrete_fit <- calc_fit(res_qp_discrete)
res_qp_discrete_const <- calc_const(res_qp_discrete)
res_qp_discrete_sum_wgt <- sum(res_qp_discrete)


# Default PSO

time_pso <- system.time({
  pso_res <- pso(
    par = rep(0, ncol(pool_returns)),
    fn = function(x){
      x <- as.vector(round(x*nav/prices)*prices/nav)
      fitness <- calc_fit(x)
      constraints <- calc_const(x)
      return(fitness + 5*constraints)
    },
    lower = 0,
    upper = 1,
    control = list(
      s = 100, # swarm size
      c.p = 0.5, # inherit best
      c.g = 0.5, # global best
      maxiter = 300, # iterations
      w0 = 1.2, # starting inertia weight
      wN = 0, # ending inertia weight
      save_traces = F # save more information
    )
  )
})
pso_res$solution <- setNames(as.vector(round(pso_res$solution*nav/prices)*prices/nav), names(res_qp$solution))

res_pso_fit <- calc_fit(pso_res$solution)
res_pso_const <- calc_const(pso_res$solution)



# PSO with solve.QP starting position
time_pso_2 <- system.time({
  pso_2_res <- pso(
    par = res_qp$solution,
    fn = function(x){
      x <- as.vector(round(x*nav/prices)*prices/nav)
      fitness <- calc_fit(x)
      constraints <- calc_const(x)
      return(fitness + 5*constraints)
    },
    lower = 0,
    upper = 1,
    control = list(
      s = 200, # swarm size
      c.p = 0.5, # inherit best
      c.g = 0.5, # global best
      maxiter = 200, # iterations
      w0 = 1.2, # starting inertia weight
      wN = 0, # ending inertia weight
      save_traces = F # save more information
    )
  )
})
pso_2_res$solution <- setNames(as.vector(round(pso_2_res$solution*nav/prices)*prices/nav), names(res_qp$solution))
res_pso_2_fit <- calc_fit(pso_2_res$solution)
res_pso_2_const <- calc_const(pso_2_res$solution)

reactable(
  data.frame(
  "type" = c("solve.QP discrete", "PSO", "PSO with solve.QP as init solution"),
  "fitness" = c(res_qp_discrete_fit, res_pso_fit, res_pso_2_fit),
  "const_break" = c(res_qp_discrete_const, res_pso_const, res_pso_2_const),
  "sum_wgt" = c(res_qp_discrete_sum_wgt, sum(pso_res$solution), sum(pso_2_res$solution)),
  "time" = c(time_qp[3], time_pso[3], time_pso_2[3])
  ),
  columns = list(
    type = colDef(width = 120),
    fitness = colDef(format = colFormat(digits=5)),
    const_break = colDef(format = colFormat(digits=9)),
    sum_wgt = colDef(format = colFormat(digits=3)),
    time = colDef(format = colFormat(digits=3))
  ),
  style = list(fontFamily="Computer Modern, sans-serif", fontSize=13)
) %>% 
  html_save(., expand = c(-40,-20,-60,-20))

It can be seen that the rounded solve.QP solution still has a good fitness but the constraints are not satisfied. The PSO has no constraint breaks and better fitness than the rounded solve.QP. The PSO with solve.QP solution as starting position has beaten both approaches. This indicates that a hybrid approach consisting of both the solve.QP and afterwards the PSO for intelligent rounding with observed constraints would be a good heuristic for such problems in practice.

AxelCode-R/Master-Thesis documentation built on Feb. 25, 2023, 7:57 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

AxelCode-R/Master-Thesis
Does not matter.

In AxelCode-R/Master-Thesis: Does not matter.

Particle Swarm Optimization {#spso}

The Algorithm

`pso()` Function

Animation 2-Dimensional

Convergence Analysis of the Standard PSO

Convergence of a General Stochastic Search Algorithm

Convergence of the PSO

Example MVP

Example ITP-MSTE

Pros and Cons for Continuous Problems

Discrete Problems

Example Discrete ITP-MSTE

R Package Documentation

Browse R Packages

We want your feedback!

AxelCode-R/Master-Thesis Does not matter.

In AxelCode-R/Master-Thesis: Does not matter.

Particle Swarm Optimization {#spso}

The Algorithm

pso() Function

Animation 2-Dimensional

Convergence Analysis of the Standard PSO

Convergence of a General Stochastic Search Algorithm

Convergence of the PSO

Example MVP

Example ITP-MSTE

Pros and Cons for Continuous Problems

Discrete Problems

Example Discrete ITP-MSTE

R Package Documentation

Browse R Packages

We want your feedback!

AxelCode-R/Master-Thesis
Does not matter.

`pso()` Function