eforensics: Election Forensics Finite Mixture Model

Description Usage Arguments Value References Examples

Description

This function estimates a finite mixture model of election fraud

Usage

1
2
3
4
5
6
7
eforensics(formula1, formula2, formula3 = NULL, formula4 = NULL,
  formula5 = NULL, formula6 = NULL, data, eligible.voters = NULL,
  weights = NULL, mcmc, model = "qbl", parameters = "all",
  na.action = "exclude", get.dic = 1000, parComp = TRUE,
  autoConv = TRUE, max.auto = 10, mcmc.conv.diagnostic = "MCMCSE",
  mcmc.conv.parameters = "pi", mcmcse.conv.precision = 0.05,
  mcmcse.combine = FALSE)

Arguments

formula1

an object of the class formula as used in lm. The dependent variable of this formula must the number (counts) or proportion of votes for the party or candidate that won the election. If counts are used, the model must be from the binomial family (see model parameter below). If proportions are provided, the model must be from the normal family (see model parameter below)

formula2

an object of the class formula as used in lm. The dependent variable of this formula must the number (counts) or proportion of abstention. The type (count or proportion) must be the same as the independent variable in formula1

formula3

See description below

formula4

See description below

formula5

See description below

formula6

See description below

Formulas 3 to 6

There are four other possible formulas to use: formula3, formula4, formula5, formula6

formula3

an object of the class formula as used in lm. The left-hand side (LHS) of the formula must be mu.iota.m (see example). The mu.iota.m is the probability of incremental fraud by manufacturing votes and it is a latent variable in the model. By specifying the LHS with that variable, the functional automatically identifies that formula as formula3. Default is NULL and it means that probability is not affected by election unit (ballot box, polling place, etc) covariate

formula4

an object of the class formula as used in lm. The left-hand side (LHS) of the formula must be mu.iota.s (see example). The mu.iota.s is the probability of incremental fraud by stealing votes from the opposition and it is a latent variable in the model. By specifying the LHS with that variable, the functional automatically identifies that formula as formula4. Default is NULL and it means that probability is not affected by election unit (ballot box, polling place, etc) covariate

formula5

an object of the class formula as used in lm. The left-hand side (LHS) of the formula must be mu.chi.m (see example). The mu.chi.m is the probability of extreme fraud by manufacturing votes and it is a latent variable in the model. By specifying the LHS with that variable, the functional automatically identifies that formula as formula5. Default is NULL and it means that probability is not affected by election unit (ballot box, polling place, etc) covariate

formula6

an object of the class formula as used in lm. The left-hand side (LHS) of the formula must be mu.chi.s (see example). The mu.chi.s is the probability of extreme fraud by stealing votes from the opposition and it is a latent variable in the model. By specifying the LHS with that variable, the functional automatically identifies that formula as formula6. Default is NULL and it means that probability is not affected by election unit (ballot box, polling place, etc) covariate

data

a data.frame with the independent variables (voters for the winner and abstention) and the covariates. If the independent variables are counts, the it is necessary to provide the total number of eligible voters (see parameter eligible.voters)

eligible.voters

string with the name of the variable in the data that contains the number of eligible voters. Default is NULL, but it is required if the independent variables (voters for the winner and abstention) are counts

weights

Deprecated.

mcmc

a list containing n.iter, which is the number of iterations for the MCMC, burn.in for the burn-in period of the MCMC chain, n.adapt indicating the number of adaptative steps before the estimation (see rjags), and n.chains, an integer indicating the number of chains to use (default 1).

model

a string with the model ID to use in the estimation. There are three current choices: qbl, bl, and rn. qbl is the default and recommended choice. For a description of each model, see ef_models_desc.

parameters

a string vector with the names of the parameters to monitor. When NULL, it will monitor all the parameters, except the Z's. When parameters='all' (default), it will monitor all parameters, including Z, which is necessary to classify the observations as fraudulent cases or not.

na.action

Deprecated.

get.dic

Deprecated.

parComp

Logical. If parComp = TRUE, then chains are computed in parallel using the runjags parallel method. This opens n.chains instances of JAGS. In practice, a max of 4 unique chains can be run due to the way in which JAGS generates initial values. If parComp = FALSE, chains are run sequentially using the runjags interruptible method.

autoConv

Logical. If autoConv = TRUE, chains are run until convergence criteria are met. Currently, chains are run for a single period equal to burn.in iterations and monitored for n.iter iterations. If mcmc.conv.diagnostic = "MCMCSE", MCMCSE values are calculated for each parameter in mcmc.conv.diagnostic. If all values are less than mcmcse.conv.precision then the chain is stopped and the chain is run for n.iter more iterations monitoring all values specified by parameters. If the MCMCSE for any parameter is higher than mcmcse.conv.precision, then the chain is run for burn.in + n.iter more iterations and the MCMCSE is again checked. This is repeated, at most, max.auto times. If the MCMCSE condition is not met by max.auto attempts, a warning message is printed and the chains are run n.iter more times with all parameters monitored. If mcmc.conv.diagnostic = "PSRF", the same procedure occurs checking that all PSRF values are less than 1.05.

max.auto

Integer. Number of subsequent tries to achieve the convergence conditions outlined by autoConv. After max.auto failures, a warning is thrown and the chain is run n.iter more times monitoring all specified parameters.

mcmc.conv.diagnostic

a string with the method to use to evaluate convergence. Currenctly, PSRF and MCMCSE (default) are implemented.

mcmc.conv.parameters

string vector with the name of the parameters to check for convergence using the MCMC standard error. Default is pi,

mcmcse.conv.precision

numeric, the value of the precision criterion to evaluate convergence using the MCMC standard error. The MCMC std. error of all parameters included in mcmcse.conv.parameter must be below the threshold defined by the value of mcmc.conv.precision (default is 0.05) to pass the convergence diagnostic.

mcmcse.combine

boolean, if TRUE, the MCMCSE is computed after the chains are combined. Otherwise, the MCMC std. error is computed for each chain, and the maximum std. error of each parameter is used for the diagnostic

Value

The function returns a nested list of class eforensics with length equal to the number of chains. Each sublist contains up to three named objects:

parameters

A mcmc object that contains the posterior draws for all monitored parameters except for the individual fraud classifications.

k.hat

A vector that contains the posterior modal classification for each observation. 1 corresponds to no fraud, 2 corresponds to incremental fraud, and 3 corresponds to extreme fraud.

piZi

A matrix with three columns that contains the posterior probability of belonging to each class for each observation.

If model = "qbl" or model = "bl", the proportion of frauds estimated at each observation is returned. These values can be accessed for object foo using attr(foo,"frauds"). This attribute is a two element list that contains the estimated proportion of votes that are Stolen and Manufactured. Posterior means, HPD intervals, and posterior quantiles are returned for each observation in the data set. These quantities are automatically aggregated over all chains.

References

Flegal, J. M., Haran, M., & Jones, G. L., Markov chain monte carlo: can we trust the third significant figure?, Statistical Science, 23(2), 250–260 (2008). Brooks, S. P., & Gelman, A., General methods for monitoring convergence of iterative simulations, Journal of computational and graphical statistics, 7(4), 434–455 (1998).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
set.seed(12345)
library(eforensics)
model    = 'qbl'

## simulate data
## -------------
set.seed(12345)
sim_data = ef_simulateData(n=250, nCov=1, nCov.fraud=1,
              model="bbl", overdispersion = 100, pi = c(.95,.04,.01))
data     = sim_data$data

## mcmc parameters
## ---------------
mcmc    = list(burn.in=1000, n.adapt=1000, n.iter=1000, n.chains=2)

## samples
## -------
## help(eforensics)

samples    = eforensics(
  w ~ x1.w ,
  a ~ x1.a,
  mu.iota.m ~ x1.iota.m,
  mu.iota.s ~ x1.iota.s,
  mu.chi.m  ~ x1.chi.m,
  mu.chi.s  ~ x1.chi.s,
  data=data,
  eligible.voters="N",
  model="qbl",
  mcmc=mcmc,
  parameters = "all",
  parComp = TRUE,
  autoConv = TRUE,
  max.auto = 10,
  mcmc.conv.diagnostic = "MCMCSE",
  mcmc.conv.parameters = c("pi"),
  mcmcse.conv.precision = .05,
  mcmcse.combine = FALSE
)

#Summaries for each of the monitored parameters
#Look at each chain separately
summary(samples)
#Combine the chains
summary(samples, join.chains=T)

#Look at the estimated fraud proportions for each observation
attr(samples,"frauds")
#Look at Manufactured and Stolen separately
attr(samples,"frauds")$Manufactured
attr(samples,"frauds")$Stolen

#How accurate is the classification?

#Get the true categories
true_z <- sim_data$latent$z

#What is the modal estimate for the class?
num_z <- (samples[[1]]$piZi*1000) + (samples[[2]]$piZi*1000)
max_z <- apply(num_z,1,which.max)

#How accurate is the modal classification?
table(true_z,max_z)

#How accurately do we uncover the proportion of frauds for each observation?

#Manufactured
true_man <- ((true_z == 1)*0) + ((true_z == 2)*sim_data$latent$iota.m) + 
            ((true_z == 3)*sim_data$latent$chi.m)

#What is the posterior mean proportion of manufactured votes
pred_man <- attr(samples,"frauds")$Manufactured[,1]

#Are they close?
plot(true_man, pred_man, xlab = "True Proportion Manufactured Votes", 
     ylab = "Estimated Proportion Manufactured Votes")

#Stolen
true_stolen <- ((true_z == 1)*0) + ((true_z == 2)*sim_data$latent$iota.s) + 
               ((true_z == 3)*sim_data$latent$chi.s)

#What is the posterior mean proportion of manufactured votes
pred_stolen <- attr(samples,"frauds")$Stolen[,1]

#Are they close?
plot(true_stolen, pred_stolen, xlab = "True Proportion Stolen Votes", 
     ylab = "Estimated Proportion Stolen Votes")

UMeforensics/eforensics_public documentation built on Oct. 31, 2019, 12:49 a.m.