tutorial/mobloc_explanation.md

Explanation of mobloc

Introduction

This document is to explain the implementation of the methods provided in mobloc.

Installing mobloc and mobvis

For reimplementing the methods, it could be useful to be able to run mobloc and mobvis. These can be installed by running these two lines of code in R:

devtools::install_github("MobilePhoneESSnetBigData/mobloc")
devtools::install_github("MobilePhoneESSnetBigData/mobvis")

The other document https://github.com/MobilePhoneESSnetBigData/mobloc/blob/master/tutorial/mobloc_vignette.md will explain mobloc and mobvis from a user persepective.

Model parameters

In the table below, the model parameters are listed.

name default description W 10 default power in Watt of a normal cell (placed in a cell tower or rooftop site) W_small 5 default power in Watt of a small cell (omnidirectional) ple 3.75 default path loss exponent ple_small 6 path loss exponent for small cells ple_0 3.5 path loss exponent for free space ple_1 4 path loss exponent for dense environments midpoint -92.5 midpoint of the logistic function used to map signal strength to signal dominance steepness 0.2 steepness of the logistic function used to map signal strength to signal dominance range 10000 maximum range of normal cells range_small 1000 maximum range of small cells height 30 default height of normal cells height_small 8 default height of small cells tilt 5 default (horizontal) tilt. Only applicable for directional cells beam_v 9 default vertical beam width. Only applicable for directional cells beam_h 65 default horizontal beam width. Only applicable for directional cells azim_dB_back -30 difference in signal strength between front and back elev_dB_back -30 difference in signal strength between front and back sig_d_th 0.005 signal dominance threshold max_overlapping_cells 100 maximum number of cells that may overlap per raster tile. If the actual number exceeds this parameter, the max_overlapping_cells cells with the highest signal strength are selected TA_step 78.12 meters that correspond to one Timing Advance (TA) step. This parameter depends on the network technology and psychical properties such as air pressure. In GSM networks it is approximately 554 meters, and LTE (4G) networks 78.12 meters. TA_max 1282 maximum Timing Advance (TA) value (integer). In other words, TA can have a value between 0 and TA_max. In GSM it is 63, and in LTE 1282. TA_buffer 1 buffer to prevent artifacts in the TA to grid tile conversion. These artifacts occur when TA_step is similar or smaller than the width of a grid tile. TA_buffer is an integer that determines the number of TA steps that are added in front of behind the actual TA band.

We explain how we use them below.

Imputation of input data

Most of the parameters are used to impute missing physical properties.

An example of the complete input dataset of the ‘cell plan’ (cell locations with physical properties) is the following:

cell small height direction tilt beam_h beam_v geometry x y z W range ple BEE_150_N1 FALSE 20.36 110 3 65 7.5 POINT (4028177 3100291) 4028177 3100291 106.8767 10 10000 3.75 BEE_150_N2 FALSE 20.36 230 3 65 4.0 POINT (4028177 3100291) 4028177 3100291 106.8767 10 10000 3.75 BEE_150_N3 FALSE 20.36 350 3 65 4.0 POINT (4028177 3100291) 4028177 3100291 106.8767 10 10000 3.75 BEE_264_N1 FALSE 24.44 325 2 65 7.5 POINT (4025802 3100291) 4025802 3100291 103.8727 10 10000 3.75 BEE_264_N2 FALSE 24.44 55 2 65 7.5 POINT (4025802 3100291) 4025802 3100291 103.8727 10 10000 3.75 BEE_264_N3 FALSE 24.44 190 2 65 7.5 POINT (4025802 3100291) 4025802 3100291 103.8727 10 10000 3.75

By ‘complete’ we mean that all variables are used by mobloc to compute the signal strength. Variables that are relevant for the MNO data processing but not used by mobloc, in particular date/time, can of course be contained in this data, but will be ignored by mobloc.

The x and y variables are the coordinates according to the used CRS. Latitude/longitude (WGS84) can be used, but for the calculation of distances it is recommended to use a CRS where distances can be directly derived from the CRS coordinates. (Distances can also be calculated directly from lat/lon coordinates but this may be computationally expensive.).

For the z variable, we use the following formula: z = elevation + height. Elevation is the meters above sea level. Height is the height of the cell from the ground. Therefore, z is the meters of the cell above sea level. For the input data, either z or height is required. Elevation is taken from an additional data source, which contains all elevation values of the area of interest.

The only mandatory fields are “cell” (identifier), and “geometry” (the location). The default parameters above are used to impute missing variables/values. For instance, if the variable “direction” is missing, all cells are considered onmidirectional. The variable “small” is only used to select a different set of default values. These are the parameters with the postfix “_small”.

To illustrate the imputation of missing values, consider this input data of 3 cells:

cell small geometry A FALSE POINT (4028177 3100291) B FALSE POINT (4028177 3100291) C TRUE POINT (4028177 3100291)

The imputed data would be:

cell small geometry x y height z direction W tilt beam_h beam_v range ple A FALSE POINT (4028177 3100291) 4028177 3100291 30 116.51673 NA 10 NA NA NA 10000 3.75 B FALSE POINT (4028177 3100291) 4028177 3100291 30 116.51673 NA 10 NA NA NA 10000 3.75 C TRUE POINT (4028177 3100291) 4028177 3100291 8 94.51673 NA 5 NA NA NA 1000 6.00

Note that the imputed height of A and B is 30 meters, but for C (labeled the small cell), it is 8.

Important to note is that all default values listed in the table with parameters above, are determined during one collaboration project with an MNO in the past. Other than face validity checks, these values have not been validated. Therefore, we strongly recommend to carefully check and if needed adjust the parameter values using the state-of-the-art knowledge of MNO data.

The mobvis package contains an interactive tool (R-Shiny app) that can be used to experiment with parameter settings. In R, it can be started with:

 mobvis::setup_sig_strength_model()

sigma

Signal strength computation

The signal strength is computed via the function compute_sig_strength. This function does the administrative part (e.g. checking input datasets and setting up parallel processes) around the core function signal_strength. This administrative part is R-specific so is less relevant when implementing in another language. Therefore, we focus on the function signal_strength.

The source code should be easy to understand for people with an IT background: https://github.com/MobilePhoneESSnetBigData/mobloc/blob/master/R/signal_strength.R . Some R-specific knowledge helpful to understand R scripts:

About content of the signal strength function. This function computes the signal strength for a set of grid tiles (for which the centroids are stored in the input argument co) for one specific cell, specified with the coordinates cx, cy, cz (as mentioned before, when using a projected CRS instead of lat/lon coordinates, the coordinates represent meters, which make distance calculations much easier/faster), and the physical properties direction, tilt, beam_h, beam_v, and W.

The signal strength consists of three components (which can be turned on and off via the input argument enabled):

"d" Distance

Signal strength decreases with distance. The path loss exponent (ple) determines to which extend. This is mainly determined by the environment of the cell: 2 can be used for free space, 4 for urban areas, and 6 for buildings. The function to compute this called in https://github.com/MobilePhoneESSnetBigData/mobloc/blob/master/R/signal_strength.R#L229

For omnidirectional cells only the "d" component is needed. For directional cells all three components.

"h" and "v" Radiation pattern

Signal strength is also reduced if target location (where the device is) differs from propagation angle, which is composed of the azimuth angle (the hozinontal plane, "h") and the elevation angle (the vertical plane, "v"). We model the radiation patterns as Gaussian distributions:

sigma

There are two input parameters that are relevant here: azim_dB_back and elev_dB_back. These contain the dB difference between the propagation direction and opposite direction (the ‘back’) in both planes. By default both are -30dB, which mean that the signal strength is 30dB weaker in the opposite ‘back’ direction. This can be seen in the radiation plots. On the left hand side, (horizontal/azimuth plane) the main propagation direction is upwards where the black line crosses the radial axis at 0dB. In the downward direction, the black line crosses the radial axis at the -30dB gridline. On the right hand side, the main propagation is to the right (where the black line crosses 0dB) and the opposite direction is to the left (-30dB).

The azimuth and elevation angles correspond to the angles in which the signal strength is reduced by 3dB. These angles are depicted above by the red lines.

The implementation is here: https://github.com/MobilePhoneESSnetBigData/mobloc/blob/master/R/signal_strength.R#L235-L251 and https://github.com/MobilePhoneESSnetBigData/mobloc/blob/master/R/signal_strength.R#L255-L261. It is hard to explain adnd understand this implementation line by line. Instead, it is easier to explain this implementation with the following picture:

sigma

The aim is to fit this Gaussian curve twice, so once for the azimuth/horizontal plane and once for the elevation/vertical plane. The x-axis stands for propagation angle, where 0 means the main propagation angle and (-)180 the opposite. The y-axis stands for dB difference with respect to the main angle. There is no difference with x=0, so that is the y=0dB point. The fit of the Gaussian curve depends on on two variables, namely the (azimuth or elevation) angle (vertical red lines) and the dB_back parameter (bottom horizontal dashed line).

There are several ways how this can be implemented. In the R implementation, the function attach_mapping creates a lookup-table that calculates the required standard deviations given the db_back parameter, for each (azimuth/elevation) degree:

deg sd 1 1.98 2 4.14 3 6.30 4 8.46 5 10.80 6 12.96 7 15.12 8 17.28 9 19.44 10 21.60 deg sd 171 171 179.1 172 172 179.1 173 173 179.1 174 174 179.1 175 175 179.1 176 176 179.1 177 177 179.1 178 178 179.1 179 179 179.1 180 180 179.1

This is a one-time operation (provided that dB_back is fixed). The function find_sd will find the standard deviation for which the (amimuth/elevation) angle is closest to deg in this lookup-table.

Calculation of signal dominance

The logistic function to compute the signal dominance is db2s (https://github.com/MobilePhoneESSnetBigData/mobloc/blob/master/R/signal_strength.R#L169).

Calculation of cell connection (in mobloc called ‘likelihood’) probabilities

This is straightforward. The implementation is very R-specific, so not usable for other programming languages.

Calculation of posterior probabilities

This is straightforward. The implementation is very R-specific, so not usable for other programming languages.

Preparing landuse data

In mobloc, we use landuse for two purposes. One is to calculate the path loss exponent, which is used to model the propagation. The second is to compute a prior distribution, which is used to compute the posterior distribution. The preparation for the landues data using OpenStreetMap (OSM) is the same process, and will be explained here.

For each grid tile we compute the fraction of land use for each of the following main categories:

(Note: in mobloc we split built-up between residential and (non-residential) buildings, but eventually, we did not use buildings differently from residential areas.)

Each part of land is assigned to at most one of these categories, so they are non-overlapping. In other words to total fraction of each grid tile should be at most one. The fraction that is not assigned to one of these categories is considered the ’rest category`, which is assumed to be open-area land (e.g. grass lands).

How to process OSM data depends on the quality of OSM data and how these OSM key-value pairs are used in practice, which can vary between countries. For the Dutch example data in mobloc, the processing script is here: https://github.com/MobilePhoneESSnetBigData/mobloc/blob/master/data_generation/ZL_landuse.R

The applied process is the following. First we obtain the OSM polygons per category. The used OSM key-value pairs are listed in the following table, along with the applied categorie. Note that the categorization is open for discussion and could also depend on the country of study. For instance, it could be better to omit the label for farmyard, making it ‘open-area’.

key value category landuse commercial built-up landuse construction built-up landuse industrial built-up landuse built-up built-up landuse retail built-up landuse depot built-up landuse farmyard built-up landuse forest forest landuse garages built-up landuse greenhouse_horticulture built-up landuse landfill built-up landuse orchard forest landuse plant_nursery forest landuse vineyard forest natural water water waterway canal water

Next, we use the OSM polylines to compute the (rail)roads. We use the key-value pairs in the following table. For each polyline, we apply a spatial buffer in order to obtain a polygon. The third column indicates the used buffer width in meters. These settings are are also open for discussion and are country-dependent.

key value width highway motorway 30 highway trunk 15 highway primary 15 highway secondary 15 highway motorway_link 15 highway trunk_link 15 highway primary_link 15 highway secondary_link 15

The spatial difference is computed between the OSM polygons and buffered OSM polygons. So from the OSM polygons in the categories built-up, forest and water, the buffered OSM polylines are subtracted. When we prepared the polygons per category, the next preparation step is to compute the fraction of each category in each grid tile. In mobloc the result is the following:

## Loading required package: tmap

## Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
## remotes::install_github('r-tmap/tmap')

Path loss exponent

For the path loss exponent, we combine these raster into one raster which is called the ‘environment’ raster. This is done by a weighted sum where the four categories get the following weights: built-up = 1, forest = 1, water = 0, (rail)roads = 0. The weights are open for discussion, but these should reflect to which extend the area contains buildings or trees that have a big influence on the propagation.

In obtain this environment raster is the following.

We have two path loss exponent parameters, one for open area (ple_0, which is by default 3.5) and one for built-up/forest area (ple_1, which is by default 4). For indoor cells, we have a different parameter called ple_small, by default 6. Whether a cell is indoor or not should be determined in another source (e.g. network typology data) rather than OSM data.

In order to obtain the path loss exponent of a certain outdoor cell, it is not sufficient to extract the value from the environment raster tile at the physical location of that cell, because the coverage area of a cell is usually much larger than can be much larger that that tile.

Therefore, we take a sample of couple of geographic points near the cell, and compute the path loss exponent using that sample. In detail:

Land use prior

The land use prior is just the the environment raster, also obtained by combining the categories using different weights. However, this weights have a different purpose than the path loss exponent application described in the previous section. Instead, they reflect how many people are expected to be in each area.

In mobloc we used the weights built-up = 1, forest = 0.1, water = 0, (rail)roads = 0.5. These weights are open for discussion, and also depend on the country of study: e.g. in Finland less people are expected in forests than in Spain.

For each grid tile, these values are summed and normalized to 1: so the total value of the whole grid should be 1. Note that normalization also takes place when computing the posterior distribution.

In the example of mobloc, the land use prior is the following:



MobilePhoneESSnetBigData/mobloc documentation built on Feb. 18, 2024, 3:41 a.m.