knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

Mobile phone network data is data generated by the network of cells owned by a mobile network operator (MNO). The MNO facilitates mobile communication and charges the corresponding costs to its customers. Most countries have more than one MNO, each of whom owns and maintains their own network of cells.

The data that is stored by the network is called signalling data. For each mobile device that is connected to the network, it contains records about active mobile phone use and about movement across the network. Network data that contains records about calls, SMS messages, and mobile data use are called Call Detail Records (CDR) or Data Detail Records (DDR) and are used for billing. Signalling data and CDR/DDR have an international standard.

Signalling data do not contain the exact geographic location of the logged events. Instead, only the id numbers of the cells are included. In mobile phone network technology, a cell enables mobile communication for a specific area. Often, people confuse cells with antennas, but these are not the same: an antenna may contain multiple cells.

The mobloc package contains a set of tools to approximate the location of mobile phone devices. This approximation is not a pinpoint location, but a spatial distribution of locations.

The methods used in mobloc`` follow a Bayesian framework. We consider three modules:

Best Server Maps (BSM) cannot be used as a source, but they can be used for validation. The mobloc package models BSM, which can be compared to the BSM of the MNO.

The methods are described in much more detail in [1].

The graphical user interfaces and visualization functions are implemented in the package mobvis [2]. Installation of this package is required to run this vignette.

library(mobloc)
library(mobvis)

Setup signal strength model parameters

The first step to approximate the geographic locations, is to determine the parameters for the signal propagation model. The default parameters can be loaded with the function mobloc_param. The result is a standard list:

ZL_param <- mobloc_param()

A short description of the parameters is provided in the documentation of mobloc_param. These parameters are used to model the signal strength. Most parameters are default values for parameters that are often unknown. For imputing unknown parameters, we consider two types of cells: normal cells, which are typically placed in cell towers or on roof tops, and small cells, which are often placed indoors or in dense urban areas (e.g. on street lights). Since these two types of cells have very different characteristics, they also have different default values. E.g. the power of a normal cell is set to 10 Watt for normal cells and 5 Watt for small cells.

Most of the parameters are used for the signal strength model, which is used to create the signal strength likelihood. The mobvis package contains a tool in which these parameters can be tuned. This tool is started as follows:

setup_sig_strength_model()

sigma

This tool shows the signal strength model for one cell. The left hand side panel shows the settings, which are by default set to values of the list we just created with mobloc_param. When small cell is ticked, the _small arguments are used as default values. The plots on the right hand side show the propagation results. The heatmap on the top right shows the top view of the signal strength of the cell. For directed cells, the direction in this plot is east.

The four plots below the heatmap describe:

Loading artificial cellplan data

When the parameters have been set, the model can be applied to cellplan data. To illustrate the model, we included artificial data to this package. This data can be loaded as follows:

data("ZL_cellplan", "ZL_muni", "ZL_elevation", "ZL_landuse")

It is artifical cellplan data from the NUTS3 region Zuid-Limburg, the most southern part of the Netherlands, which is roughly 30 by 30 kilometres large.

The object ZL_cellplan is an sf object (see packge sf) that contains all the geopgraphic locations of the cells and the metadata.

ZL_cellplan

The object ZL_land is a large multipolygon that defines the area. The object ZL_elevation is a raster object that contains the elevation heigths at 100 by 100 metre detail.

These example data can be plot with the tmap package

library(tmap)
tmap_mode("view")
qtm(ZL_elevation) + qtm(ZL_muni, fill=NULL) + qtm(ZL_cellplan)

sigma

The object ZL_landuse is a raster object that contains fractions of land that is used for several categories. We will use this information later on to determine the level of urbanization per cell and to define prior information.

ZL_envir <- combine_raster_layers(ZL_landuse, weights = c(1, 1, 1, 0, 0)) 

The function validate_cellplan should be used to validate the cellplan. It checks if required variables are present. Variables that are not required as input, but needed for further analysis are imputed. For instance, if tilt is missing, it is imputed by the default value ZL_param$tilt.

ZL_cellplan <- validate_cellplan(ZL_cellplan, param = ZL_param, region = ZL_muni, envir = ZL_envir, elevation = ZL_elevation)

The corresponding bounding box of Zuid-Limburg is created as follows.

library(sf)
ZL_bbox <- st_bbox(c(xmin = 4012000, ymin = 3077000, xmax = 4048000, ymax = 3117000), crs = st_crs(3035))

Modelling the signal strength

In the next stage, the signal strength propagation is modeled. This is done with rasterization, i.e. all spatial objects are transformed into small tiles, e.g. of 100 by 100 meters. This is mainly done for computational reasons. The raster is created as follows:

# create a raster of 100 by 100 meter cells
ZL_raster <- create_raster(ZL_bbox)

This object is a raster object that contains the raster id values, which are numbered from top left to bottom right row-wise.

For each tile $g$ and cell $a$ where $g$ intersects with the polygon of $a$, the following variables are computed: the distance, the modeled signal strength, the relative signal strength, and the likelihood value. The function for these calculations, called process_cellplan, supports parallel computing. A parallel cluster can be created as follows:

# create a parallel cluster
require(parallel)
require(doParallel)
ncores <- detectCores()
cl <- makeCluster(ncores)
registerDoParallel(cl)

Note that it is not required to create a parallel cluster in order to process the cellplan. It is recommended for large cellplans since it reduced the computation time significantly.

# calculate probabilities
ZL_strength <- compute_sig_strength(cp = ZL_cellplan, raster = ZL_raster,
    elevation = ZL_elevation, param = ZL_param)

The result is a data.frame that contains information about the modelled propagation (that is where the abbreviation prop stands for). It has the following columns:

Mobile location approximation

Prior

In this example, we propose three priors:

ZL_uniform_prior <- create_uniform_prior(ZL_raster)
ZL_network_prior <- create_network_prior(ZL_strength, ZL_raster)
ZL_landuse_prior <- create_prior(ZL_landuse, weights = c(1, 1, .1, 0, .5))

Since each of the previous priors have pros and cons, it may be worthwile to create a composite prior in order to find good balance. In this example we use the network prior for 25% and the land use prior for 75%.

ZL_comp_prior <- create_prior(ZL_network_prior, ZL_landuse_prior, weights = c(.25, .75))

The result can be visualized using tmap:

qtm(ZL_comp_prior)

sigma

Likelihood

The following functions create likelihoods based on the signal strength model above, and, for reference, on the Voronoi tessellation:

ZL_strength_llh <- create_strength_llh(ZL_strength, param = ZL_param)
ZL_voronoi_llh <- create_voronoi_llh(ZL_cellplan, ZL_raster)

The variable in these data pag is the probability that a certain cell is used given that the device is located in a certain the tile.

Posterior

The posterior distribution P(g|a), can be calculated as follows:

ZL_post <- calculate_posterior(prior = ZL_comp_prior, llh = ZL_strength_llh, raster = ZL_raster)

When timing advance (TA) is available, the posterior can be updated using TA bands. The parameters, such as the width of the TA bands, are configured in the parameters of mobloc. The posterior distribution can be updated (through a Bayesian update) with TA as follows:

ZL_post_TA <- update_posterior_TA(ZL_post, ZL_raster, ZL_cellplan, ZL_param, ZL_elevation)

Exploration tool

The following tool from the mobvis package is used to explore the results:

explore_mobloc(ZL_cellplan, ZL_raster, ZL_strength,
               list(landuse = ZL_landuse_prior, network = ZL_network_prior, uniform = ZL_uniform_prior),
               list(Strength = ZL_strength_llh, Voronoi = ZL_voronoi_llh),
               param = ZL_param)

sigma

For the area size of this data (Zuid-Limburg, Netherlands), this interactive tool will work on most computers. However, for larger areas we recommend to use a filter on the area of interest. The code to use a filter for Maastricht is:

Maastricht_bbox <- st_bbox(c(xmin = 4012000, ymin = 3085000, xmax = 4024000, ymax = 3098000), crs = st_crs(3035))

explore_mobloc(ZL_cellplan, ZL_raster, ZL_strength,
               list(landuse = ZL_landuse_prior, network = ZL_network_prior, uniform = ZL_uniform_prior),
               list(Strength = ZL_strength_llh, Voronoi = ZL_voronoi_llh),
               param = ZL_param,
               filter = Maastricht_bbox)

explore_mobloc(ZL_cellplan, ZL_raster, ZL_prop, list(landuse = ZL_landuse_prior, network = ZL_network_prior, uniform = ZL_uniform_prior),  filter = Maastricht_bbox)

The columns of the obtained data.frame are the antenan id cell, the raster id rid, and the posterior probability pga, so the probability that a mobile phone is located in tile g, given that it is connected to cell a.

Reference

[1] Tennekes, M., Gootzen, Y.A.P.M., Shah, S.H, 2020, A Bayesian approach to location estimation of mobile devices from mobile network operator data. Statistics Netherlands, Working paper.

[2] Tennekes, M. mobvis: visualization of mobile phone location algorithm results, R package version 0.1.0. https://github.com/MobilePhoneESSnetBigData/mobvis



MobilePhoneESSnetBigData/mobloc documentation built on Feb. 18, 2024, 3:41 a.m.