output: pdf_document: default html_document: default
This repo contains code to process mobile phone network data with the purpose to produce statistical output, such as daytime population.
Data is not contained in this repo. Open datasets, such as CBS Wijk-Buurtdata, are downloaded by running the scripts. Other required datasets can be accessed in a private environment.
The first output we want to produce is the number of devices per hour per 100 squared meter grid cells and per 1000 squared meter grid cells, where values less than 15 are reported as missing. These datasets are send to CBS for further analysis and to produce estimations on daytime population.
Cellplan, which contains information how cell towers are placed. Each technology (2G, 3G, 4G) has its own cellplan. Basically, there are four options in which cellplan data can be delivered:
Signalling data, which contains all events on the network that are registered for billing purposes (CDR) and network analysis.
1. Constructing the cellplan
For each cell, a polygon is drawn with spline line though the center, and the -3dB points at angle -/+ horizontal beamwidth, where the radius of these points are caluculated using the tilt (with a maximum of 10 kilometer).
Input: cellplan, second option above
Output: polygon per cell
R-Function: process_cellplan()
2. Determine probabilities of presence
This algorithm calculates the probability that a device is present in a grid cell $i$ when an event is registered in the signalling data at cell $j$. This probability depends on the distance between $i$ and $j$, and the number of overlapping cells.
Bayes' formula is used to calculate these probabilities:
$$P(i|j) \propto P(i)P(j|i)$$
where $i$ represents a grid cell and $j$ the cell polygon.
The prior, $P(i)$, is fixed to 1. The likelihood is $$\frac{1}{(\sum_k \frac{1}{d(k,i)^2})d(j, i)^4}$$ where $d(j,i)$ is the distance between the center of $i$ and the location of cell tower $j$. This method is implemented as follows.
Input: cellplan (polygon per cell), grid
Output: data.frame with three columns: cell_name, grid-id, and p
R-Function: rasterize_cellplan()
3. Determine administrative region per grid-cell
Administrative regions can be municipalities, neighbourhoods, postal codes, etc. The method is straightforward. Per grid cell, the region with the largest overlap is allocated. For computational reasons, an easier method is to find the intersecting admin polygon per grid cell center point.
Input grid, admin regions (polygons)
Output data.frame with two columns: grid-id and region-id
4. Determine probabilities of presence per admin region
The probabilities that are calculated in step 2 are aggregated to administrative regions.
Input data.frame of step 2, data.frame of step 3 Output data.frame with three columns: cell_name, region-id, and p
5. Determine place of residence
The devices (IMSI numbers) may not be joined with customer data due to privacy. Therefore, it is not possible to know to residential address. However, by following a device over time, the place of residence can be approximated. In literature there two pragmatic methods are proposed (references to be added):
The procedure to implement this method can be the following. Per device:
~~2. Calculate a weight per event, namely the time in seconds between the previous and the next event. (In essence, the same method is used for the statistics on road statistics to calibrating the road sensor data. The road corresponds to the time scale and the road sensors to the event times.)~~
Calculate a weight per event, namely 1 over the number of events per device per hour: simple, scales well (better than taking previous event into account) and no strange edge cases (e.g. previous event was two weeks ago).
Aggregate the events by cell; per cell, sum the weights. This results in a data.frame with two columns: cell_name, and total weight w.
(alternative is to distribute place of residence over multiple region-id's)
Input signaling data, data.frame of step 4 Output data.frame with two columns: IMSI (device) and region-id.
This place of residence generates a region of residence.
A different approach is a location of residence:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.