Description Usage Arguments Details Value Author(s) References Examples
This function generates simulated ecological data, i.e., data in the form of contigency tables in which the row and column totals but none of the internal cell counts are observed. At the user's option, data from simulated surveys of some of the ‘units’ (in voting parlance, 'precincts') that gave rise to the contingency tables are also produced.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 
gendata.ep(nprecincts = 175,
nrowcat = 3,
ncolcat = 3,
colcatnames = c("Dem", "Rep", "Abs"),
mu0 = c(.6, 2.05, 1.7, .2, 1.45, 1.45),
rowcatnames = c("bla", "whi", "his", "asi"),
alpha = c(.35, .45, .2, .1),
housing.seg = 1,
nprecincts.ep = 40,
samplefrac.ep = 1/14,
K0 = NULL,
nu0 = 12,
Psi0 = NULL,
lambda = 1000,
dispersion.low.lim = 1,
dispersion.up.lim = 1,
outfile=NULL,
his.agg.bias.vec = c(0,0),
HerfInvexp = 3.5,
HerfNoInvexp = 3.5,
HerfReasexp = 2)

nprecincts 
positive integer: The number of contingency tables (precincts) in the simulated dataset. 
nrowcat 
integer > 1: The number of rows in each of the contingency tables. 
ncolcat 
integer > 1: The number of columns in each of the contingency tables. 
rowcatnames 
string of length = length( 
colcatnames 
string of length = length( 
alpha 
vector of length( 
housing.seg 
scalar > 0: multiplied to alpha to generate final parameters to Dirichlet distribution used to generate each contingency table's row fractions. 
mu0 
vector of length ( 
K0 
square matrix of dimension ( 
nu0 
scalar > 0: the degrees of freedom for the InvWishart hyperprior from which the Σ matrix will be drawn. 
Psi0 
square matrix of dimension ( 
lambda 
scalar > 0: initial parameter of the Poisson distribution from which the number of voters in each precinct will be drawn 
dispersion.low.lim 
scalar > 0 but < dispersion.up.lim:
lower limit of a draw from 
dispersion.up.lim 
scalar > dispersion.low.lim:
upper limit of a draw from 
outfile 
string ending in ".Rdata": filepath and name of
object; if nonNULL, the object returned by this function will be
saved to the location specified by 
his.agg.bias.vec 
vector of length 2: only implemented for nowcat = 3 and ncolcat = 3: if nonnull, induces aggregation bias into the simulated data. See Details. 
nprecincts.ep 
integer > 1 and less than nprecincts: number of contingency tables (precincts) to be included in simulated survey sample (ep for "exit poll"). 
samplefrac.ep 
fraction (real number between 0 and 1): percentage of individual units (voters) within each contingency table (precinct) include in the survey sample. 
HerfInvexp 
scalar: exponent used to generate inverted quasiHerfindahl weights used to sample contingency tables (precincts) for inclusion in a sample survey. See Details. 
HerfNoInvexp 
scalar: same as HerInvexp except the quasiHerfindahl weights are not inverted. See Details. 
HerfReasexp 
scalar: same as HerfInvexp, for a separate sample survey. See Details. 
This function simulates data from the ecological inference model outlined in Greiner \& Quinn (2009). At the user's option (by setting nprecincts.ep to an integer greater than 0), the function generates three survey samples from the simulated dataset. The specifics of the function's operation are as follows.
First, the function simulates the total number of individual units
(voters) in each contigency table (precinct) from a Poisson
distribution with parameter lambda
* runif(1, dispersion.low.lim,
dispersion.up.lim). Next, for each table, the function simulates the
vector of fraction of units (voters) in each table (precinct) row.
The fractions are simulated from a Dirichlet distribution with
parameter vector housing.seg
* alpha
. The row fractions are
multiplied by the total number of units (voters), and the resulting
vector is rounded to produce contingency table row counts for each
table.
Next, a vector μ is simulated from a multivariate normal
with mean mu0
and covariance matrix K0
. A covariance
matrix Sigma
is simulated from an InvWishart with
nu0
degrees of freedom and scale matrix Psi0
.
Next, nprecincts
vectors are drawn from N(μ, Σ). Each of these draws undergoes an inversestacked
multidimensional logistic transformation to produce a set of nrowcat
probability vectors (each of which sums to one) for nrowcat
multinomial distributions, one for each row in that contingency
table. Next, the nrowcat
multinomial values, which represent the true (and
in real life, unobserved) internal cell counts, are drawn from the relevant row
counts and these probability vectors. The column totals are
calculated via summation.
If nprecincts.ep
is greater than 0, three simulated surveys (exit polls) are
drawn. All three select contingency tables (precincts) using weights
that are a function of the composition of the row totals. Specifically the row
fractions are raised to a power q and then summed (when q = 2 this calculation is
known in antitrust law as a Herfindahl index). For one of the three
surveys (exit polls) gendata.ep
generates, these
quasiHerfindahl indices are the weights. For two of the three
surveys (exit polls) gendata.ep
generates, denoted EPInv
and EPReas
, the sample weights are the reciprocals of these
quasiHerfindhal indices. The former method tends to weight
contingency tables (precincts) in which one row dominates the table
higher than contigency tables (precincts) in which row fractions are close to the
same. In voting parlance, precincts in which one racial group
dominates are more likely to be sampled than racially mixed
precincts. The latter method, in which the sample weights are
reciprocated, weights contingency tables in which row fractions are
similar more highly; in voting parlance, mixedrace precincts are more
likly to be sampled.
For example, suppose nrowcat
= 3, HerInvexp
= 3.5,
HerfReas
= 2, and
HerfNoInv
= 3.5. Consider
contingency table P1 with row counts (300, 300, 300) and contingency
table P2 with row counts (950, 25, 25). Then:
Row fractions: The corresponding row fractions are (300/900, 300/900, 300/900) = (.33, .33, .33) and (950/1000, 25/1000, 25/1000) = (.95, .025, .025).
EPInv weights: EPInv
would
sample from assign P1 and P2 weights as follows: 1/sum(.33^3.5,
.33^3.5, .33^3.5) = 16.1 and 1/sum(.95^3.5, .025^3.5, .025^3.5) =
1.2.
EPReas weights: EPReas
would assign weights as
follows: 1/sum(.33^2, .33^2, .33^2) = 3.1 and 1/sum(.95^2, .025^2,
.025^2) = 1.1.
EPNoInv weights: EPNoInv
would assign weights as
follows: sum(.33^3.5, .33^3.5, .33^3.5) = .062 and sum(.95^3.5,
.025^3.5, .025^3.5) = .84.
For each of the three simulated surveys (EPInv
, EPReas
,
and EPNoInv
), gendata.ep
returns a list of length
three. The first element of the list, returnmat.ep
, is a matrix of
dimension nprecincts
by (nrowcat
* ncolcat
)
suitable for passing to TuneWithExitPoll
and
AnalyzeWithExitPoll
. That is, the first row of
returnmat.ep
corresponds to the first row of GQdata
,
meaning that they both contain information from the same
contingency table. The second row of returnmat.ep
contains
information from the contingency table represented in the second row
of GQdata
. And so on. In addition, returnmat.ep
has counts
from the sample of the contingency table in vectorized row major
format, as required for TuneWithExitPoll
and
AnalyzeWithExitPoll
.
If nrowcat
= ncolcat
= 3, then the user may set
his.agg.bias.vec
to be nonzero. This will introduce aggregation
bias into the data by making the probability vector of the second row
of each contingency table a function of the fractional composition of
the third row. In voting parlance, if the rows are black, white, and
Hispanic, the white voting behavior will be a function of the percent
Hispanic in each precinct. For example, if his.agg.bias.vec
=
c(1.7, 3), and if the fraction Hispanic in each precinct i is
X_{h_i}, then in the ith precinct, the μ_i[3]
is set to mu0[3]
+ X_{h_i} * 1.7, while μ_i[4]
is set to mu0[4]
+ X_{h_i} * 3. This feature
allows testing of the ecological inference model with aggregation
bias.
A list with the follwing elements.
GQdata 
Matrix of dimension 
EPInv 
List of length 3: 
EPNoInv 
List of length 3: Contains the same elements as

EPReas 
List of length 3: Contains the same elements as

omega.matrix 
Matrix of dimension 
interior.tables 
List of length 
mu 
vector of length 
Sigma 
square matrix of dimension 
Sigma.diag 
the output of 
Sigma.corr 
the output of 
sim.check.vec 
vector: the true values of the
parameters generated by 
D. James Greiner \& Kevin M. Quinn
D. James Greiner \& Kevin M. Quinn. 2009. “R x C Ecological Inference: Bounds, Correlations, Flexibility, and Transparency of Assumptions.” J.R. Statist. Soc. A 172:6781.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44  ## Not run:
SimData < gendata.ep() # simulated data
FormulaString < "Dem, Rep, Abs ~ bla, whi, his"
EPInvTune < TuneWithExitPoll(fstring = FormulaString,
data = SimData$GQdata,
exitpoll=SimData$EPInv$returnmat.ep,
num.iters = 10000,
num.runs = 15)
EPInvChain1 < AnalyzeWithExitPoll(fstring = FormulaString,
data = SimData$GQdata,
exitpoll=SimData$EPInv$returnmat.ep,
num.iters = 2000000,
burnin = 200000,
save.every = 2000,
rho.vec = EPInvTune$rhos,
print.every = 20000,
debug = 1,
keepTHETAS = 0,
keepNNinternals = 0)
EPInvChain2 < AnalyzeWithExitPoll(fstring = FormulaString,
data = SimData$GQdata,
exitpoll=SimData$EPInv$returnmat.ep,
num.iters = 2000000,
burnin = 200000,
save.every = 2000,
rho.vec = EPInvTune$rhos,
print.every = 20000,
debug = 1,
keepTHETAS = 0,
keepNNinternals = 0)
EPInvChain3 < AnalyzeWithExitPoll(fstring = FormulaString,
data = SimData$GQdata,
exitpoll=SimData$EPInv$returnmat.ep,
num.iters = 2000000,
burnin = 200000,
save.every = 2000,
rho.vec = EPInvTune$rhos,
print.every = 20000,
debug = 1,
keepTHETAS = 0,
keepNNinternals = 0)
EPInv < mcmc.list(EPInvChain1, EPInvChain2, EPInvChain3)
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.