In this chapter, we discuss the process to solve \@ref(eq:changepointsestimation) and explore a solution involving dynamic programming. Because the exact solution to this problem has quadratic complexity, we also discuss some alternatives that simplify the search path and provide an approximate solution.
The first attempt to solve the equation would be to search for all the possible alternatives to find the correct minimal set of intervals that minimize the total cost of the system. It's done iteratively by:
Finally, after computing all of the possible combinations of segment costs, the procedure to find the optimal solution is:
With the execution of the procedure described above, the estimated set of change points is stored in $C$. The solution will be exact because it analyses all the possible combinations. However, that has a time complexity of $O(m^2)$ in the Big-O notation, for the number of columns $m$ in the data set. Because the number of columns in data segmentation problems can be very large, computation time can be very prohibitive. For those situations, approximate solutions are described.
To simplify the search path of the algorithm, [@base-paper] also proposes a technique that relies on the assumption of the data hierarchical, i.e. a segment can be sub-divided by reapplying the segmentation function recursively. The algorithm is described as:
Implementing the algorithm described above will estimate the set of change points $C$ with time complexity of $O(m\log(m))$, for the number of columns $m$ in the data set. The reduction in time complexity is only possible because of the strong assumption the data set has a hierarchical cost behavior. However, that is not true for all types of segmentation problems, as it will be seen in more detail in the next chapters. So, the usage of this algorithm should be considered with care.
The hybrid algorithm is a modified version of the hierarchical algorithm, in which it will start searching the segments using the hierarchical algorithm, and if the segments become smaller than a certain number of columns threshold, it will switch to the exact algorithm. The procedure would be as follows:
The only difference between this and the hierarchical procedure is the presence of the conditional step is the beginning, that tests whether it should switch over to the exact algorithm. As will be shown in later chapters, the accuracy of this algorithm is shown not to be more advantageous than using the hierarchical or the exact algorithms directly, depending on the situation.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.