Fitting Parametric Models and Quantifying Missing Information for Ecological Inference in 2x2 Tables
Description
ecoML
is used to fit parametric models for ecological
inference in 2 \times 2 tables via Expectation Maximization (EM)
algorithms. The data is specified in proportions. At it's most basic setting, the algorithm
assumes that the individuallevel proportions (i.e., W_1 and W_2) and distributed bivariate normally (after logit
transformations). The function calculates point estimates of the parameters for models
based on different assumptions. The standard errors of the point
estimates are also computed via Supplemented EM algorithms. Moreover,
ecoML
quantifies the amount of missing information associated
with each parameter and allows researcher to examine the impact of
missing information on parameter estimation in ecological
inference. The models and algorithms are described in Imai,
Lu and Strauss (2008, 2011).
Usage
1 2 3 4 
Arguments
formula 
A symbolic description of the model to be fit,
specifying the column and row margins of 2 \times
2 ecological tables. 
data 
An optional data frame in which to interpret the variables
in 
N 
An optional variable representing the size of the unit; e.g.,
the total number of voters. 
supplement 
An optional matrix of supplemental data. The matrix
has two columns, which contain additional individuallevel data such
as survey data for W_1 and W_2, respectively. If

fix.rho 
Logical. If 
context 
Logical. If 
sem 
Logical. If 
theta.start 
A numeric vector that specifies the starting values
for the mean, variance, and covariance. When 
epsilon 
A positive number that specifies the convergence criterion
for EM algorithm. The square root of 
maxit 
A positive integer specifies the maximum number of iterations
before the convergence criterion is met. The default is 
loglik 
Logical. If 
hyptest 
Logical. If 
verbose 
Logical. If 
Details
When SEM
is TRUE
, ecoML
computes the observeddata
information matrix for the parameters of interest based on SupplementedEM
algorithm. The inverse of the observeddata information matrix can be used
to estimate the variancecovariance matrix for the parameters estimated
from EM algorithms. In addition, it also computes the expected completedata
information matrix. Based on these two measures, one can further calculate
the fraction of missing information associated with each parameter. See
Imai, Lu and Strauss (2006) for more details about fraction of missing
information.
Moreover, when hytest=TRUE
, ecoML
allows to estimate the
parametric model under the null hypothesis that mu_1=mu_2
. One
can then construct the likelihood ratio test to assess the hypothesis of
equal means. The associated fraction of missing information for the test
statistic can be also calculated. For details, see Imai, Lu
and Strauss (2006) for details.
Value
An object of class ecoML
containing the following elements:
call 
The matched call. 
X 
The row margin, X. 
Y 
The column margin, Y. 
N 
The size of each table, N. 
context 
The assumption under which model is estimated. If

sem 
Whether SEM algorithm is used to estimate the standard errors and observed information matrix for the parameter estimates. 
fix.rho 
Whether the correlation or the partial correlation between W_1 an W_2 is fixed in the estimation. 
r12 
If 
epsilon 
The precision criterion for EM convergence. √{ε} is the precision criterion for SEM convergence. 
theta.sem 
The ML estimates of E(W_1),E(W_2),
var(W_1),var(W_2), and cov(W_1,W_2). If

W 
Insample estimation of W_1 and W_2. 
suff.stat 
The sufficient statistics for 
iters.em 
Number of EM iterations before convergence is achieved. 
iters.sem 
Number of SEM iterations before convergence is achieved. 
loglik 
The loglikelihood of the model when convergence is achieved. 
loglik.log.em 
A vector saving the value of the loglikelihood function at each iteration of the EM algorithm. 
mu.log.em 
A matrix saving the unweighted mean estimation of the logittransformed individuallevel proportions (i.e., W_1 and W_2) at each iteration of the EM process. 
Sigma.log.em 
A matrix saving the log of the variance estimation of the logittransformed
individuallevel proportions (i.e., W_1 and W_2) at each iteration of EM process.
Note, nontransformed variances are displayed on the screen (when 
rho.fisher.em 
A matrix saving the fisher transformation of the estimation of the correlations between
the logittransformed individuallevel proportions (i.e., W_1 and W_2) at each iteration of EM process.
Note, nontransformed correlations are displayed on the screen (when 
Moreover, when sem=TRUE
, ecoML
also output the following
values:
DM 
The matrix characterizing the rates of convergence of the EM algorithms. Such information is also used to calculate the observeddata information matrix 
Icom 
The (expected) complete data information matrix estimated
via SEM algorithm. When 
Iobs 
The observed information matrix. The dimension of

Imiss 
The difference between 
Vobs 
The (symmetrized) variancecovariance matrix of the ML parameter
estimates. The dimension of 
Iobs 
The (expected) completedata variancecovariance matrix.
The dimension of 
Vobs.original 
The estimated variancecovariance matrix of the
ML parameter estimates. The dimension of 
Fmis 
The fraction of missing information associated with each parameter estimation. 
VFmis 
The proportion of increased variance associated with each parameter estimation due to observed data. 
Ieigen 
The largest eigen value of 
Icom.trans 
The complete data information matrix for the fisher transformed parameters. 
Iobs.trans 
The observed data information matrix for the fisher transformed parameters. 
Fmis.trans 
The fractions of missing information associated with the fisher transformed parameters. 
Author(s)
Kosuke Imai, Department of Politics, Princeton University, kimai@Princeton.Edu, http://imai.princeton.edu; Ying Lu, Center for Promoting Research Involving Innovative Statistical Methodology (PRIISM), New York University, ying.lu@nyu.Edu; Aaron Strauss, Department of Politics, Princeton University, abstraus@Princeton.Edu.
References
Imai, Kosuke, Ying Lu and Aaron Strauss. (2011). “eco: R Package for Ecological Inference in 2x2 Tables” Journal of Statistical Software, Vol. 42, No. 5, pp. 123. available at http://imai.princeton.edu/software/eco.html
Imai, Kosuke, Ying Lu and Aaron Strauss. (2008). “Bayesian and Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data Approach” Political Analysis, Vol. 16, No. 1 (Winter), pp. 4169. available at http://imai.princeton.edu/research/eiall.html
See Also
eco
, ecoNP
, summary.ecoML
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35  ## load the census data
data(census)
## NOTE: convergence has not been properly assessed for the following
## examples. See Imai, Lu and Strauss (2006) for more complete analyses.
## In the first example below, in the interest of time, only part of the
## data set is analyzed and the convergence requirement is less stringent
## than the default setting.
## In the second example, the program is arbitrarily halted 100 iterations
## into the simulation, before convergence.
## load the Robinson's census data
data(census)
## fit the parametric model with the default model specifications
## Not run: res < ecoML(Y ~ X, data = census[1:100,], N=census[1:100,3],
epsilon=10^(6), verbose = TRUE)
## End(Not run)
## summarize the results
## Not run: summary(res)
## obtain outofsample prediction
## Not run: out < predict(res, verbose = TRUE)
## summarize the results
## Not run: summary(out)
## fit the parametric model with some individual
## level data using the default prior specification
surv < 1:600
## Not run: res1 < ecoML(Y ~ X, context = TRUE, data = census[surv,],
supplement = census[surv,c(4:5,1)], maxit=100, verbose = TRUE)
## End(Not run)
## summarize the results
## Not run: summary(res1)
