Introduction

[@base-paper] describes a method of segmenting a data set of a finite alphabet into blocks of independent variables. This work expands on that by minimizing a cost function rather than maximizing a likelihood function, the two being equivalend when the cost function output is the opposite of the likelihood function output. We show this generalization can be used for other use-cases, e.g. segments with homogeneous values (see Chapter \@ref(segments-with-similar-averages)), segments that have the same linear regression trend (see Chapter )\@ref(real-data-examples)).

This work produced an R Programming Language [@R-lang] package. The packaging of the code into redistributable software was based on instructions provided by [@r-package-tutorial], and can be installed through CRAN [@segmentr-cran], with the goal of making it as easy as possible for researchers and R programmers to use this software. All the source code is open-source and available on GitHub [@segmentr-github]. Finally, a specialized version of the multivariate likelihood function is implemented in native code using the Rcpp package [@rcpp], which allows a faster execution for the use-case described in [@base-paper].

Related Work

[@fpop] also talks about optimally segmenting a data set with the minimization of a given cost function. In their paper, they discuss search path algorithms different than the ones shown in this work. In their case, if the cost function satisfies certain conditions, the fpop and snip search algorithms can prune the search path in a mathematically optimal way. They claim the algorithms presented in their work provide better results than previous work.

Notice that, because the approach used by [@fpop] is different than the one used by [@base-paper], the fpop and snip algorithms couldn't be investigated and validated in time to be implemented in the segmentr package. However, if in future work those algorithms provide good results estimation and performance compared to the current algorithms, they can be included in the segmentr package. That will allow users to use the newly implemented algorithms with minimal code modification, by changing a single parameter.



thalesmello/segmentr documentation built on March 4, 2020, 1 a.m.