computeDuplicityBayesian: Computes the duplicity probabilities for each device using a...

Description Usage Arguments Value

View source: R/computeDuplicityBayesian.R

Description

Computes the duplicity probabilities for each device using a Bayesian approach. It uses two methods: "pairs" and "1to1". The "pairs" method considers the possible pairs of two compatible devices. These devices were selected by computePairs() function taking into consideration the antennas where the devices are connected and the coverage areas of antennas. Two devices are considered compatible if they are connected to the same or to neighbouring antennas. Thus, the data set with pairs of devices will be considerable smaller than all possible combinations of two devices. The "1to1" method considers all pairs of two devices when computing the duplicity probability, the time complexity being much greater than that of the "pairs" method. Both methods uses parallel computations to speed up the execution. They build a cluster of working nodes and splits the pairs of devices equally among these nodes.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
computeDuplicityBayesian(
  method,
  deviceIDs,
  pairs4dupl,
  modeljoin,
  llik,
  P1 = NULL,
  Pii = NULL,
  init = TRUE,
  lambda = NULL
)

Arguments

method

Selects a method to compute the duplicity probabilities. It could have one of the two values: "pairs" or "1to1". When selecting "pairs" method, the pairs4dupl parameter contains only the compatible pairs of devices, i.e. the pairs that most of the time are connected to the same or to neighbouring antennas. "1to1" method checks all possible combinations between devices to compute the duplicity probabilities.

deviceIDs

A vector with the device IDs. It is obtained by calling the getDevices() function.

pairs4dupl

A data.table object with pairs of devices and pairs of antennas where these devices are connected. It can be obtained by calling computePairs() function.

modeljoin

The joint HMM model returned by getJointModel() function.

llik

A vector with the values of the log likelihoods after the individual HMM models for each device were fitted. This vector can be obtained by calling fitModels() function.

P1

The apriori duplicity probability as it is returned by aprioriDuplicityProb() function. It is used when "pairs" method is selected.

Pii

Apriori probability of a device to be in a 1-to-1 correspondence with the holder as it is returned by aprioriOneDeviceProb() function. This parameter is used only when "1to1" method is selected.

init

A logical value. If TRUE, the fit() function uses the stored steady state as fixed initialization, otherwise the steady state is computed at every call of fit() function.

lambda

It is used only when "1to1" method is selected and a non NULL value mean that the computation of the duplicity probabilities is performed according to the method described in An end-to-end statistical process with mobile network data for Official Statistics paper.

Value

a data.table object with two columns: deviceID and dupP. On the first column there are deviceIDs and on the second column the corresponding duplicity probability, i.e. the probability that a device is in a 2-to-1 correspondence with the holder.


bogdanoancea/deduplication documentation built on Dec. 2, 2020, 11:22 p.m.