knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette contains an introduction to destim: its main purpose, syntax and some technical details of its internal behaviour. Some basic knowledge about Hidden Markov Models would be useful to understand how the package works, but not essential to follow this vignette.
This section contains a brief explanation about the intended use of the package and some remarks on the methodology.
This package purpose is to estimate the spatial distribution of the devices, which corresponds to an intermediate step in the general methodology framework described in @WP5Deliverable1.3.
The network events are the result of the constant interaction between a mobile device and a telecommunication network. While the information comprised in these events can be quite complex (and rich), since we are mostly interested in geolocation, a space likelihood that summarizes the event is all we need. The transformation from the network events to the likelihood is quite technical and clearly corresponds to the mobile network operators (MNO), see @WP5Deliverable1.3 for more details.
By space likelihood, we mean the likelihood of the event conditioned on the position of the device. It is quite obvious to identify this likelihood with the emission probabilities of a Hidden Markov model.
So, we propose a Hidden Markov model to describe the movement of the devices. Those models are quite general and flexible, can be made simple or as complex as wished. Unlike usual, we are not going to estimate the emission probabilities, that as said we consider known.
As we are mainly interested in estimate location, and observation events are expected to give mostly information about it, location itself is a natural choice as state of the model. Since we are considering a tessellation of the map, each possible state would be a tile.
While this could be the simpler approach, it is important to note that more complexity in the state-space might be useful. For example, we might want to represent a car moving north in a highway and a car moving south in the same tile (the tile contains both lanes) by different states, as they are expected to go next to different tiles.
So, if we denote by $n$ the number of tiles, we are going to have not less than $n$ states, and possibly more, so let us say we have $O(n)$ states. In a Hidden Markov model, this means that we have $O(n^2)$ transition probabilities to estimate. Of course, this is not viable, so we are going to fix to zero all transition probabilities to non-contiguous tiles.
Note that in practice, we can do this without losing generality, because given an upper bound for speed $V$ and a lower bound $E$ for the distance between non-contiguous tiles, we can set $\Delta t = \frac{E}{V}$ and the jump will no longer be possible.
Now, we have $O(n)$ non zero transitions, which is more affordable, but still very expensive. If we want to reduce this complexity, one option is to classify the states in a certain number of classes, and constrain the transition probabilities to be equal for tiles of the same kind. This only makes sense for periodic tesselations, so it is a strong argument to use periodic tesselations better than other possible choices (Voronoi, BSA, etc.). This is not a limitation of the package though, so it is still possible to estimate models based in non-periodic tilings, but $O(n)$ parameters would have to be estimated or more complex constraints would have to be specified. In practice, the package allows any linear constraint between the transition probabilities. There is also some support for non linear constraints.
Thus, we can use constraints to reduce the number of parameters as much as wanted. It is a good idea to keep small the number of (free) parameters: on one hand the likelihood optimization becomes less computationally expensive and on the other hand we get a more parsimonious model.
Once we have defined an appropiate model for our problem, the next step is to estimate the (hopefully few) free parameters. As has been already stated, emissions are known, so there are no emission parameters to fit. The initial state is either fixed or set to steady state, so the only parameters to fit in practice are the probabilities of transition.
The method used to estimate the parameters is maximum likelihood, and the forward algorithm computes the (minus) log-likelihood. A constrained optimization is then done. Note that EM algorithm is generally not a good choice for constrained problems, so it is not used in the package.
To get the objective function and the constraints, some previous steps are required to reduce the dimension of the search space. Let $P$ be the column vector of transition probabilities. The linear constraints can be represented as the augmented matrix $(A \vert B)$ so that $AP = B$. After a pivoted QR decomposition is done, we have $R \tilde{P} = Q' B$ where $\tilde{P}$ is a permutation of $P$ and $R$ an upper triangular matrix with non decreasing diagonal elements. Moreover we can express $R$ in blocks as: $$ R = \left( \begin{array}{c c } R_{11} & R_{12} \ 0 & 0 \end{array} \right) $$ where $R_{11}$ is a full-rank square upper diagonal matrix. Accordingly we can define blocks for $Q$ and $\tilde{P}$: \begin{align} Q & = \left( \begin{array}{c c} Q_1 & Q_2 \end{array} \right) \ \tilde{P} & = \left( \begin{array}{c} \tilde{P}_1 \ \tilde{P}_2 \end{array} \right) \end{align}
Note that $Q_2' B = 0$, otherwise the constraints can not be fulfilled. The variables in $\tilde{P}2$ are taken as the free parameters, because being $R{11}$ full-rank, we have $\tilde{P}1 = R{11}^{-1}(Q_1'B - R_{12} \tilde{P}_2)$ so $\tilde{P}_2$ determines $\tilde{P}_1$. These free parameters are transition probabilities and fully determine the transition matrix and thus the likelihood. Moreover, the equality constraints have vanished and now we only have to impose that transition probabilities are between zero and one. Obviously those are linear constraints for $\tilde{P}_2$, so all we have to do is a linear constrained non-linear optimization in the same space. The linear contraints may seem to be a lot, but in practice, a good modeling will make most of the constraints equal, as most of the transition probabilities are going to be equal.
Usually, algorithms for constrained optimization will require an initial value in the interior of the feasible region. To get such initial value, the following algorithm is used:
As already stated, the initial state is set to steady if not fixed. The steady state is calculated as the (normalized to sum up one) solution to $(T - I)x = 0$, where $T$ is the transition matrix, $I$ the identity and the last component of $x$ is set to one divided by its dimension. This should be enough because these Markov chains are expected to be both irreducible and aperiodic, otherwise we would have strange movement restrictions.
Once the model has been fit, we can estimate the smooth states by means of the forward-backward algorithm. The smooth states are the mass probability function of the states given the observed data (and the model), thus they kind of summarize all the information available for a given time $t$. So one of the outputs of the package are the smooth states, that can be aggregated to get a space distribution of the number of devices as explained in @WP5Deliverable1.3, section 4.2.
The other main output of the package is the posterior joint mass probability function for two consecutive instants $t, t + \Delta t$ of the states. As it is a posterior probability, it is once again conditioned on all the information available, but it is more dynamic because its time reference is now an interval. A possible analogy would be the smooth positions and speeds of a particle. The former would correspond to position and the later to speed.
Both outputs are needed to estimate the target population.
This section explains briefly the main functions of the package.
Obviously, the first step is to create a model. In destim, the primary model creator is the function HMM.
library(destim, warn.conflicts = FALSE)
model <- HMM(5)
When the first parameter is a number, it contains the number of states, so we have five states. Since we have not specified a list of transitions, all states transition to themselves and so there are only five transitions too.
nstates(model) ntransitions(model) transitions(model)
The transitions with non zero probability are represented by an integer matrix with two rows, where each column is a transition. The first column is the initial state and the second the final state. Of course, the states are represented by an integer number. The columns of the matrix are ordered first by initial state and then by final state.
Now, let us look at the constraints.
constraints(model)
As we have not specified any constraints, one constraint by state is introduced, the sum of the transition probabilities fixed one initial state has to be one. Otherwise, the transition matrix would not be stochastic. In general, the package adjusts this specific kind of constraints automatically.
The constraints are represented as the augmented matrix of a linear system of equations. The transition probabilities must fulfill the equations, with the same order as shown in transitions function. So the first coefficient in each row is for the transition probability of the transition shown in the first column of the matrix of transitions, and so on.
Both transitions and constraints can be specified as parameters when creating the model. It is also possible to add transitions and constraints later.
model <- addtransition(model,c(1,2)) model <- addtransition(model,c(2,3)) model <- addconstraint(model,c(2,4)) transitions(model) constraints(model)
Now it becomes possible to transition from state 1 to state 2, and from state 2 to state 3. Moreover, we have added an equality constraint: the second transition probability is equal to the fourth one. In the matrix of transitions we can see that those transitions are the transition from one to two and from two to three respectively, that we have just added.
The constraints matrix is a row major sparse matrix. When the second parameter of addconstraint is a vector, it is expected to reference two transitions whose probabilities are equal. The still transitions (from one state to itself), should not be referenced as they are treated automatically so all transition probabilities starting from any fixed state sum up to one. The purpose of these equality constraints is to improve performance as they might be a lot and can be treated easily.
When the second parameter of addconstraint is a matrix, it is just row bound to the matrix of constraints. This is intended to add more general linear constraints, and it is not recommended to add equality constraints this way, as performance might decrease notably. However, (non automatic) equalities that involve still transitions have to be introduced this way.
The emission probabilities are also stored in a matrix, where the element $e_{i j}$ is the probability of the event $j$ to happen, knowing that the device is placed in tile $i$. Of course, the number of possible events is expected to be much smaller than the number of actually observed events. It is possible, though, that the number of columns of the emissions matrix matches the number of actually observed events. This way we do not save memory with the matrix, but it allows us to do the estimations even for a continuous space of observable events.
Note that, in particular, if any possible event corresponds to a column of the matrix of emissions, each row sums to one. This does not happen in general, as the columns do not need to be exhaustive, and in the continous case they do not even can.
As we have not specified the emission probabilities, they are set to NULL by default.
emissions(model) emissions(model)<-matrix(c(0.3, 0.3, 0.7, 0.9, 0.9, 0.7, 0.7, 0.3, 0.1, 0.1), nrow = 5, ncol = 2) emissions(model)
Emission probabilities are expected to be computed separately, so the model is ready to directly insert the emissions matrix.
Of course, in practice models will have many states and will be created automatically. While the purpose of this package is estimation and not automatic modeling (at least for the moment), some functions have been added to ease the construction of example models.
The function HMMrectangle creates a rectangle model. This model represents a rectangular grid with square tiles, where you can only stay in the same tile or go to a contiguous tile. This means that there are nine non zero transition probabilities by tile. Moreover, horizontal and vertical transitions have the same probability, and diagonal transitions also have the same probability (but different to vertical and horizontal).
As obvious, the rectangle model only have two free parameters to fit, but the number of transitions can be very high even for small rectangles.
model <- HMMrectangle(10,10) ntransitions(model) nconstraints(model)
A small rectangle of 10x10 tiles has 784 transitions! Fortunatelly, we only need to fit two parameters. Note that the number of constraints plus the number of free parameters agrees with the number of transitions.
A very simple function to create emissions matrices is also provided. It is called createEM and the observation events are just connections to a specific tower. The input parameters are the dimensions of the rectangle, the location of towers (in grid units) and the distance decay function of the signal strength Martijn Tennekes (2018). In this case, each tile is required to be able to connect to at least one antenna, so no out of coverage tiles are allowed. Note that this is not a requirement for the model, just a limitation of createEM.
tws <- matrix(c(3.2, 6.1, 2.2, 5.7, 5.9, 9.3, 5.4, 4.0, 2.9, 8.6, 6.9, 6.2, 9.7, 1.3), nrow = 2, ncol = 7) S <- function(x) if (x > 5) return(0) else return(20*log(5/x)) emissions(model)<-createEM(c(10,10), tws, S) dim(emissions(model))
Once the model is defined, the next step is to fit its parameters, by constrained maximum likelihood. As already said, the optimizer usually requires an initial guess, so the function initparams obtains an initial random set of parameters.
model <- initparams(model) all(model$parameters$transitions < 1) all(model$parameters$transitions > 0) range(constraints(model) %*% c(model$parameters$transitions, -1))
All transition probabilities are between zero and one and the constraints hold, but no observed data is used.
The function minparams reduces the number of parameters of the model to the number of free parameters, as already explained.
ntransitions(model) model <- minparams(model) rparams(model)
So, only two parameters are really needed! It is possible to assign values with rparams, as the optimizer does, but some transition probability might move outside the interval $[0,1]$. The optimization process avoids this problem constraining by linear inequalities.
Now the model is ready to be fitted. Of course, the observed events are needed. Since we have the emissions matrix in the model, the observed events is just an integer vector, that refers to the appropiate column of the emissions matrix.
obs <- c(1,2,NA,NA,NA,NA,7,7) logLik(model, obs) model <- fit(model, obs) rparams(model) logLik(model, obs)
Despite the name, logLik returns minus the log-likelihood, so the smaller the better. As in the example, it is possible to introduce missing values for the observations as NA.
Finally, the model is ready to produce some estimations. The main outputs of this package are the smooth states and the smooth consecutive pairwaise states (sometimes called $\xi_{i j}$ in the literature).
The function sstates returns the smooth states as a matrix of number of states $\times$ number of observations (missing values included) dimensions. So each column represents the space distribution in its corresponding time slot.
dim(sstates(model, obs)) image(matrix(sstates(model, obs)[,4], ncol = 10))
The function scpstates returns the (smooth) joint probability mass function for consecutive states, analogous to the usually denoted $\xi_{i j}$ probabilities in the Baum-Welch algorithm (once convergence is achieved). This time, each column represents the space bi-variant distribution matrix as follows: the probability of the consecutive pair of states $(i,j)$ can be found in the row $100(i-1) + j$.
dim(scpstates(model, obs)) image(as.matrix(scpstates(model, obs)[,1:100 + 3*100]), xlim = c(0,1))
Both each row and each column of the previous image are bidimensional spaces, so it is difficult to visualize: we would need four dimensions instead two! Even so, it is easy to see a diagonal pattern, which is coherent with the transition matrix. The transition matrix only allows transitions to contiguous points, so it is almost diagonal. In consequence, so are the $\xi_{i j}$ probabilities.
The package has some degree of optimization, as it uses Rcpp and RcppEigen (for sparse linear algebra) in some critical functions. It can handle fairly well models with around $10^7$ states in a desktop computer (if enough RAM is provided). Some faster sparse linear algebra library (for example intel MKL) might improve a little bit the performance of such operations. Also, we are going to comment the possibility of improve the performance using parallel computing
As has been already stated, the package is not really ready for model construction, except for the function HMMrectangle, which are very basic models. While it is a rather fast function, its performance can be easily improved as it is an embarrassingly parallel algorithm. This also allows to use a cluster to generate the model.
The algorithm used to find an inital value for the optimizer, solves a linear system of equations at each step. While the order of the system grows with the number of transitions and constraints, in practice, most constraints state the equality between two transition probabilities. It is not difficult to get rid of one of the transitions and the constraint in those cases, so in practice the process scales very well for parsimonious models.
Mostly all said for initialization also goes for parametrization, where the QR decomposition is the slow step.
This is mostly multiply sparse matrices, which relies on another library (currently Eigen). Eigen is not the faster library at this even on a desktop computer but the performance is not so bad. In the future intel MKL might be a good choice, and it can also work in distributed mode.
The purpose of all the previous steps is to make easier the task of the optimizer, so it should not be a problem if everything else is fine. Transition probabilities are required to be between zero and one, what in general means O(n) inequality constraints. A well specified parsimonious model, will often have a lot less constraints, as most of them are duplicated. As a reference, a HMMrectangle(20,20) has 3364 transitions and only 5 constraints are needed in practice.
Equalities between transition probabilities would generate duplicated rows in the matrix of inequality constraints which are eliminated before calling the optimizer.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.