Home

/

CRAN

/

RxCEcolInf

/

gendata.ep: Function To Simulate Ecological and Survey Data For Use in...

gendata.ep: Function To Simulate Ecological and Survey Data For Use in...
In RxCEcolInf: 'R x C Ecological Inference With Optional Incorporation of Survey Information'

Description Usage Arguments Details Value Author(s) References Examples

View source: R/SimDataEP.R

This function generates simulated ecological data, i.e., data in the form of contigency tables in which the row and column totals but none of the internal cell counts are observed. At the user's option, data from simulated surveys of some of the ‘units’ (in voting parlance, 'precincts') that gave rise to the contingency tables are also produced.

 
gendata.ep(nprecincts = 175,
           nrowcat = 3,
           ncolcat = 3,
           colcatnames = c("Dem", "Rep", "Abs"),
           mu0 = c(-.6, -2.05, -1.7, -.2, -1.45, -1.45),
           rowcatnames = c("bla", "whi", "his", "asi"),
           alpha = c(.35, .45, .2, .1),
           housing.seg = 1,
           nprecincts.ep = 40,
           samplefrac.ep = 1/14,
           K0 = NULL,
           nu0 = 12,
           Psi0 = NULL,
           lambda = 1000,
           dispersion.low.lim = 1,
           dispersion.up.lim = 1,
           outfile=NULL,
           his.agg.bias.vec = c(0,0),
           HerfInvexp = 3.5,
           HerfNoInvexp = 3.5,
           HerfReasexp = 2)

`nprecincts`	positive integer: The number of contingency tables (precincts) in the simulated dataset.
`nrowcat`	integer > 1: The number of rows in each of the contingency tables.
`ncolcat`	integer > 1: The number of columns in each of the contingency tables.
`rowcatnames`	string of length = length(`nrowcat`): Names of rows in each contingency table.
`colcatnames`	string of length = length(`ncolcat`): Names of columns in each contingency table.
`alpha`	vector of length(`nrowcat`): initial parameters to a Dirichlet distribution used to generate each contingency table's row fractions.
`housing.seg`	scalar > 0: multiplied to alpha to generate final parameters to Dirichlet distribution used to generate each contingency table's row fractions.
`mu0`	vector of length (`nrowcat` * (`ncolcat` - 1)): The mean of the multivariate normal hyperprior at the top level of the hierarchical model from which the data are simulated. See Details.
`K0`	square matrix of dimension (`nrowcat` * (`ncolcat` - 1)): the covariance matrix of the multivariate normal hyperprior at the top level of the hierarchical model from which the data are simulated. See Details.
`nu0`	scalar > 0: the degrees of freedom for the Inv-Wishart hyperprior from which the Σ matrix will be drawn.
`Psi0`	square matrix of dimension (`nrowcat` * (`ncolcat` - 1)): scale matrix for the Inv-Wishart hyperprior from which the `SIGMA` matrix will be drawn.
`lambda`	scalar > 0: initial parameter of the Poisson distribution from which the number of voters in each precinct will be drawn
`dispersion.low.lim`	scalar > 0 but < dispersion.up.lim: lower limit of a draw from `runif()` to be multiplied to `lambda` to set a lower limit on the parameter used to draw from the Poisson distribution that determines the number of voters in each precinct.
`dispersion.up.lim`	scalar > dispersion.low.lim: upper limit of a draw from `runif()` to be multiplied to `lambda` to set a upper limit on the parameter used to draw from the Poisson distribution that determines the number of voters in each precinct.
`outfile`	string ending in ".Rdata": filepath and name of object; if non-NULL, the object returned by this function will be saved to the location specified by `outfile`.
`his.agg.bias.vec`	vector of length 2: only implemented for nowcat = 3 and ncolcat = 3: if non-null, induces aggregation bias into the simulated data. See Details.
`nprecincts.ep`	integer > -1 and less than nprecincts: number of contingency tables (precincts) to be included in simulated survey sample (ep for "exit poll").
`samplefrac.ep`	fraction (real number between 0 and 1): percentage of individual units (voters) within each contingency table (precinct) include in the survey sample.
`HerfInvexp`	scalar: exponent used to generate inverted quasi-Herfindahl weights used to sample contingency tables (precincts) for inclusion in a sample survey. See Details.
`HerfNoInvexp`	scalar: same as HerInvexp except the quasi-Herfindahl weights are not inverted. See Details.
`HerfReasexp`	scalar: same as HerfInvexp, for a separate sample survey. See Details.

This function simulates data from the ecological inference model outlined in Greiner \& Quinn (2009). At the user's option (by setting nprecincts.ep to an integer greater than 0), the function generates three survey samples from the simulated dataset. The specifics of the function's operation are as follows.

First, the function simulates the total number of individual units (voters) in each contigency table (precinct) from a Poisson distribution with parameter lambda * runif(1, dispersion.low.lim, dispersion.up.lim). Next, for each table, the function simulates the vector of fraction of units (voters) in each table (precinct) row. The fractions are simulated from a Dirichlet distribution with parameter vector housing.seg * alpha. The row fractions are multiplied by the total number of units (voters), and the resulting vector is rounded to produce contingency table row counts for each table.

Next, a vector μ is simulated from a multivariate normal with mean mu0 and covariance matrix K0. A covariance matrix Sigma is simulated from an Inv-Wishart with nu0 degrees of freedom and scale matrix Psi0.

Next, nprecincts vectors are drawn from N(μ, Σ). Each of these draws undergoes an inverse-stacked multidimensional logistic transformation to produce a set of nrowcat probability vectors (each of which sums to one) for nrowcat multinomial distributions, one for each row in that contingency table. Next, the nrowcat multinomial values, which represent the true (and in real life, unobserved) internal cell counts, are drawn from the relevant row counts and these probability vectors. The column totals are calculated via summation.

If nprecincts.ep is greater than 0, three simulated surveys (exit polls) are drawn. All three select contingency tables (precincts) using weights that are a function of the composition of the row totals. Specifically the row fractions are raised to a power q and then summed (when q = 2 this calculation is known in antitrust law as a Herfindahl index). For one of the three surveys (exit polls) gendata.ep generates, these quasi-Herfindahl indices are the weights. For two of the three surveys (exit polls) gendata.ep generates, denoted EPInv and EPReas, the sample weights are the reciprocals of these quasi-Herfindhal indices. The former method tends to weight contingency tables (precincts) in which one row dominates the table higher than contigency tables (precincts) in which row fractions are close to the same. In voting parlance, precincts in which one racial group dominates are more likely to be sampled than racially mixed precincts. The latter method, in which the sample weights are reciprocated, weights contingency tables in which row fractions are similar more highly; in voting parlance, mixed-race precincts are more likly to be sampled.

For example, suppose nrowcat = 3, HerInvexp = 3.5, HerfReas = 2, and HerfNoInv = 3.5. Consider contingency table P1 with row counts (300, 300, 300) and contingency table P2 with row counts (950, 25, 25). Then:

Row fractions: The corresponding row fractions are (300/900, 300/900, 300/900) = (.33, .33, .33) and (950/1000, 25/1000, 25/1000) = (.95, .025, .025).

EPInv weights: EPInv would sample from assign P1 and P2 weights as follows: 1/sum(.33^3.5, .33^3.5, .33^3.5) = 16.1 and 1/sum(.95^3.5, .025^3.5, .025^3.5) = 1.2.

EPReas weights: EPReas would assign weights as follows: 1/sum(.33^2, .33^2, .33^2) = 3.1 and 1/sum(.95^2, .025^2, .025^2) = 1.1.

EPNoInv weights: EPNoInv would assign weights as follows: sum(.33^3.5, .33^3.5, .33^3.5) = .062 and sum(.95^3.5, .025^3.5, .025^3.5) = .84.

For each of the three simulated surveys (EPInv, EPReas, and EPNoInv), gendata.ep returns a list of length three. The first element of the list, returnmat.ep, is a matrix of dimension nprecincts by (nrowcat * ncolcat) suitable for passing to TuneWithExitPoll and AnalyzeWithExitPoll. That is, the first row of returnmat.ep corresponds to the first row of GQdata, meaning that they both contain information from the same contingency table. The second row of returnmat.ep contains information from the contingency table represented in the second row of GQdata. And so on. In addition, returnmat.ep has counts from the sample of the contingency table in vectorized row major format, as required for TuneWithExitPoll and AnalyzeWithExitPoll.

If nrowcat = ncolcat = 3, then the user may set his.agg.bias.vec to be nonzero. This will introduce aggregation bias into the data by making the probability vector of the second row of each contingency table a function of the fractional composition of the third row. In voting parlance, if the rows are black, white, and Hispanic, the white voting behavior will be a function of the percent Hispanic in each precinct. For example, if his.agg.bias.vec = c(1.7, -3), and if the fraction Hispanic in each precinct i is X_{h_i}, then in the ith precinct, the μ_i[3] is set to mu0[3] + X_{h_i} * 1.7, while μ_i[4] is set to mu0[4] + X_{h_i} * -3. This feature allows testing of the ecological inference model with aggregation bias.

A list with the follwing elements.

`GQdata`	Matrix of dimension `nprecincts` by (`nrowcat` + `ncolcat`): The simulated (observed) ecological data, meaning the row and column totals in the contingency tables. May be passed as `data` argument in `Tune`, `Analyze`, `TuneWithExitPoll`, and `AnalyzeWithExitPoll`
`EPInv`	List of length 3: `returnmat.ep`, the first element in the list, is a matrix that may be passed as the `exitpoll` argument in `TuneWithExitPoll` and `AnalyzeWithExitPoll`. See Details. `ObsData` is a dataframe that may be used as the `data` argument in the `survey` package. `sampprecincts.ep` is a vector detailing the row numbers of `GQdata` (meaning the contingency tables) that were included in the `EPInv` survey (exit poll). See Details for an explanation of the weights used to select the contingency tables for inclusion in the `EPInv` survey (exit poll).
`EPNoInv`	List of length 3: Contains the same elements as `EPInv`. See Details for an explanation of weights used to select the contingency tables for inclusion in the `EPNoInv` survey (exit poll).
`EPReas`	List of length 3: Contains the same elements as `EPInv`. See Details for an explanation of weights used to select the contingency tables for inclusion in the `EPReas` survey (exit poll).
`omega.matrix`	Matrix of dimension `nprecincts` by (`nrowcat` * (`ncolcat`-1)): The matrix of draws from the multivariate normal distribution at the second level of the hiearchical model giving rise to `GQdata`. These values undergo an inverse-stacked-multidimensional logistic transformation to produce contingency table row probability vectors.
`interior.tables`	List of length `nprecincts`: Each element of the list is a full (meaning all interior cells are filled in) contingency table.
`mu`	vector of length `nrowcat` * (`ncolcat`-1): the μ vector drawn at the top level of the hierarchical model giving rise to `GQdata`. See Details.
`Sigma`	square matrix of dimension `nrowcat` * (`ncolcat`-1): the covariance matrix drawn at the top level of the hierarchical model giving rise to `GQdata`. See Details.
`Sigma.diag`	the output of `diag(Sigma)`.
`Sigma.corr`	the output of `cov2cor(Sigma)`.
`sim.check.vec`	vector: the true values of the parameters generated by `Analyze` and `AnalyzeWithExitPoll` in the same order as the parameters are produced by those two functions. This vector is useful in assessing the coverage of intervals from the posterior draws from `Analyze` and `AnalyzeWithExitPoll`.

D. James Greiner \& Kevin M. Quinn

D. James Greiner \& Kevin M. Quinn. 2009. “R x C Ecological Inference: Bounds, Correlations, Flexibility, and Transparency of Assumptions.” J.R. Statist. Soc. A 172:67-81.

## Not run: 
SimData <- gendata.ep()    #  simulated data
FormulaString <- "Dem, Rep, Abs ~ bla, whi, his"
EPInvTune <-  TuneWithExitPoll(fstring = FormulaString,
                               data = SimData$GQdata,
                               exitpoll=SimData$EPInv$returnmat.ep,
                               num.iters = 10000,
                               num.runs = 15)
EPInvChain1 <- AnalyzeWithExitPoll(fstring = FormulaString,
                                   data = SimData$GQdata,
                                   exitpoll=SimData$EPInv$returnmat.ep,
                                   num.iters = 2000000,
                                   burnin = 200000,
                                   save.every = 2000,
                                   rho.vec = EPInvTune$rhos,
                                   print.every = 20000,
                                   debug = 1,
                                   keepTHETAS = 0,
                                   keepNNinternals = 0)
EPInvChain2 <- AnalyzeWithExitPoll(fstring = FormulaString,
                                   data = SimData$GQdata,
                                   exitpoll=SimData$EPInv$returnmat.ep,
                                   num.iters = 2000000,
                                   burnin = 200000,
                                   save.every = 2000,
                                   rho.vec = EPInvTune$rhos,
                                   print.every = 20000,
                                   debug = 1,
                                   keepTHETAS = 0,
                                   keepNNinternals = 0)
EPInvChain3 <- AnalyzeWithExitPoll(fstring = FormulaString,
                                   data = SimData$GQdata,
                                   exitpoll=SimData$EPInv$returnmat.ep,
                                   num.iters = 2000000,
                                   burnin = 200000,
                                   save.every = 2000,
                                   rho.vec = EPInvTune$rhos,
                                   print.every = 20000,
                                   debug = 1,
                                   keepTHETAS = 0,
                                   keepNNinternals = 0)
EPInv <- mcmc.list(EPInvChain1, EPInvChain2, EPInvChain3)

## End(Not run)

RxCEcolInf documentation built on Nov. 6, 2021, 5:07 p.m.

RxCEcolInf index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RxCEcolInf
'R x C Ecological Inference With Optional Incorporation of Survey Information'

gendata.ep: Function To Simulate Ecological and Survey Data For Use in...
In RxCEcolInf: 'R x C Ecological Inference With Optional Incorporation of Survey Information'

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to gendata.ep in RxCEcolInf...

R Package Documentation

Browse R Packages

We want your feedback!

RxCEcolInf 'R x C Ecological Inference With Optional Incorporation of Survey Information'

gendata.ep: Function To Simulate Ecological and Survey Data For Use in... In RxCEcolInf: 'R x C Ecological Inference With Optional Incorporation of Survey Information'

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to gendata.ep in RxCEcolInf...

R Package Documentation

Browse R Packages

We want your feedback!

RxCEcolInf
'R x C Ecological Inference With Optional Incorporation of Survey Information'

gendata.ep: Function To Simulate Ecological and Survey Data For Use in...
In RxCEcolInf: 'R x C Ecological Inference With Optional Incorporation of Survey Information'