Nothing
###
### main function
###
#' Fitting Parametric Models and Quantifying Missing Information for Ecological
#' Inference in 2x2 Tables
#'
#' \code{ecoML} is used to fit parametric models for ecological inference in
#' \eqn{2 \times 2} tables via Expectation Maximization (EM) algorithms. The
#' data is specified in proportions. At it's most basic setting, the algorithm
#' assumes that the individual-level proportions (i.e., \eqn{W_1} and
#' \eqn{W_2}) and distributed bivariate normally (after logit transformations).
#' The function calculates point estimates of the parameters for models based
#' on different assumptions. The standard errors of the point estimates are
#' also computed via Supplemented EM algorithms. Moreover, \code{ecoML}
#' quantifies the amount of missing information associated with each parameter
#' and allows researcher to examine the impact of missing information on
#' parameter estimation in ecological inference. The models and algorithms are
#' described in Imai, Lu and Strauss (2008, 2011).
#'
#' When \code{SEM} is \code{TRUE}, \code{ecoML} computes the observed-data
#' information matrix for the parameters of interest based on Supplemented-EM
#' algorithm. The inverse of the observed-data information matrix can be used
#' to estimate the variance-covariance matrix for the parameters estimated from
#' EM algorithms. In addition, it also computes the expected complete-data
#' information matrix. Based on these two measures, one can further calculate
#' the fraction of missing information associated with each parameter. See
#' Imai, Lu and Strauss (2006) for more details about fraction of missing
#' information.
#'
#' Moreover, when \code{hytest=TRUE}, \code{ecoML} allows to estimate the
#' parametric model under the null hypothesis that \code{mu_1=mu_2}. One can
#' then construct the likelihood ratio test to assess the hypothesis of equal
#' means. The associated fraction of missing information for the test statistic
#' can be also calculated. For details, see Imai, Lu and Strauss (2006) for
#' details.
#'
#' @param formula A symbolic description of the model to be fit, specifying the
#' column and row margins of \eqn{2 \times 2} ecological tables. \code{Y ~ X}
#' specifies \code{Y} as the column margin (e.g., turnout) and \code{X} (e.g.,
#' percent African-American) as the row margin. Details and specific examples
#' are given below.
#' @param data An optional data frame in which to interpret the variables in
#' \code{formula}. The default is the environment in which \code{ecoML} is
#' called.
#' @param N An optional variable representing the size of the unit; e.g., the
#' total number of voters. \code{N} needs to be a vector of same length as
#' \code{Y} and \code{X} or a scalar.
#' @param supplement An optional matrix of supplemental data. The matrix has
#' two columns, which contain additional individual-level data such as survey
#' data for \eqn{W_1} and \eqn{W_2}, respectively. If \code{NULL}, no
#' additional individual-level data are included in the model. The default is
#' \code{NULL}.
#' @param fix.rho Logical. If \code{TRUE}, the correlation (when
#' \code{context=TRUE}) or the partial correlation (when \code{context=FALSE})
#' between \eqn{W_1} and \eqn{W_2} is fixed through the estimation. For
#' details, see Imai, Lu and Strauss(2006). The default is \code{FALSE}.
#' @param context Logical. If \code{TRUE}, the contextual effect is also
#' modeled. In this case, the row margin (i.e., X) and the individual-level
#' rates (i.e., \eqn{W_1} and \eqn{W_2}) are assumed to be distributed
#' tri-variate normally (after logit transformations). See Imai, Lu and Strauss
#' (2006) for details. The default is \code{FALSE}.
#' @param sem Logical. If \code{TRUE}, the standard errors of parameter
#' estimates are estimated via SEM algorithm, as well as the fraction of
#' missing data. The default is \code{TRUE}.
#' @param theta.start A numeric vector that specifies the starting values for
#' the mean, variance, and covariance. When \code{context = FALSE}, the
#' elements of \code{theta.start} correspond to (\eqn{E(W_1)}, \eqn{E(W_2)},
#' \eqn{var(W_1)}, \eqn{var(W_2)}, \eqn{cor(W_1,W_2)}). When \code{context =
#' TRUE}, the elements of \code{theta.start} correspond to (\eqn{E(W_1)},
#' \eqn{E(W_2)}, \eqn{var(W_1)}, \eqn{var(W_2)}, \eqn{corr(W_1, X)},
#' \eqn{corr(W_2, X)}, \eqn{corr(W_1,W_2)}). Moreover, when
#' \code{fix.rho=TRUE}, \eqn{corr(W_1,W_2)} is set to be the correlation
#' between \eqn{W_1} and \eqn{W_2} when \code{context = FALSE}, and the partial
#' correlation between \eqn{W_1} and \eqn{W_2} given \eqn{X} when \code{context
#' = FALSE}. The default is \code{c(0,0,1,1,0)}.
#' @param epsilon A positive number that specifies the convergence criterion
#' for EM algorithm. The square root of \code{epsilon} is the convergence
#' criterion for SEM algorithm. The default is \code{10^(-6)}.
#' @param maxit A positive integer specifies the maximum number of iterations
#' before the convergence criterion is met. The default is \code{1000}.
#' @param loglik Logical. If \code{TRUE}, the value of the log-likelihood
#' function at each iteration of EM is saved. The default is \code{TRUE}.
#' @param hyptest Logical. If \code{TRUE}, model is estimated under the null
#' hypothesis that means of \eqn{W1} and \eqn{W2} are the same. The default is
#' \code{FALSE}.
#' @param verbose Logical. If \code{TRUE}, the progress of the EM and SEM
#' algorithms is printed to the screen. The default is \code{FALSE}.
#' @return An object of class \code{ecoML} containing the following elements:
#' \item{call}{The matched call.}
#' \item{X}{The row margin, \eqn{X}.}
#' \item{Y}{The column margin, \eqn{Y}.}
#' \item{N}{The size of each table, \eqn{N}.}
#' \item{context}{The assumption under which model is estimated. If
#' \code{context = FALSE}, CAR assumption is adopted and no contextual effect
#' is modeled. If \code{context = TRUE}, NCAR assumption is adopted, and
#' contextual effect is modeled.} \item{sem}{Whether SEM algorithm is used to
#' estimate the standard errors and observed information matrix for the
#' parameter estimates.}
#' \item{fix.rho}{Whether the correlation or the partial
#' correlation between \eqn{W_1} an \eqn{W_2} is fixed in the estimation.}
#' \item{r12}{If \code{fix.rho = TRUE}, the value that \eqn{corr(W_1, W_2)} is
#' fixed to.}
#' \item{epsilon}{The precision criterion for EM convergence.
#' \eqn{\sqrt{\epsilon}} is the precision criterion for SEM convergence.}
#' \item{theta.sem}{The ML estimates of \eqn{E(W_1)},\eqn{E(W_2)},
#' \eqn{var(W_1)},\eqn{var(W_2)}, and \eqn{cov(W_1,W_2)}. If \code{context =
#' TRUE}, \eqn{E(X)},\eqn{cov(W_1,X)}, \eqn{cov(W_2,X)} are also reported.}
#' \item{W}{In-sample estimation of \eqn{W_1} and \eqn{W_2}.}
#' \item{suff.stat}{The sufficient statistics for \code{theta.em}.}
#' \item{iters.em}{Number of EM iterations before convergence is achieved.}
#' \item{iters.sem}{Number of SEM iterations before convergence is achieved.}
#' \item{loglik}{The log-likelihood of the model when convergence is achieved.}
#' \item{loglik.log.em}{A vector saving the value of the log-likelihood
#' function at each iteration of the EM algorithm.}
#' \item{mu.log.em}{A matrix saving the unweighted mean estimation of the
#' logit-transformed individual-level proportions (i.e., \eqn{W_1} and \eqn{W_2})
#' at each iteration of the EM process.} \item{Sigma.log.em}{A matrix saving the
#' log of the variance estimation of the logit-transformed individual-level
#' proportions (i.e., \eqn{W_1} and \eqn{W_2}) at each iteration of EM process.
#' Note, non-transformed variances are displayed on the screen (when
#' \code{verbose = TRUE}).}
#' \item{rho.fisher.em}{A matrix saving the fisher
#' transformation of the estimation of the correlations between the
#' logit-transformed individual-level proportions (i.e., \eqn{W_1} and
#' \eqn{W_2}) at each iteration of EM process. Note, non-transformed
#' correlations are displayed on the screen (when \code{verbose = TRUE}).}
#' Moreover, when \code{sem=TRUE}, \code{ecoML} also output the following
#' values:
#' \item{DM}{The matrix characterizing the rates of convergence of the
#' EM algorithms. Such information is also used to calculate the observed-data
#' information matrix}
#' \item{Icom}{The (expected) complete data information
#' matrix estimated via SEM algorithm. When \code{context=FALSE, fix.rho=TRUE},
#' \code{Icom} is 4 by 4. When \code{context=FALSE, fix.rho=FALSE}, \code{Icom}
#' is 5 by 5. When \code{context=TRUE}, \code{Icom} is 9 by 9.}
#' \item{Iobs}{The observed information matrix. The dimension of \code{Iobs}
#' is same as \code{Icom}.}
#' \item{Imiss}{The difference between \code{Icom} and \code{Iobs}.
#' The dimension of \code{Imiss} is same as \code{miss}.}
#' \item{Vobs}{The (symmetrized) variance-covariance matrix of the ML parameter
#' estimates. The dimension of \code{Vobs} is same as \code{Icom}.}
#' \item{Iobs}{The (expected) complete-data variance-covariance matrix. The
#' dimension of \code{Iobs} is same as \code{Icom}.}
#' \item{Vobs.original}{The estimated variance-covariance matrix of the ML parameter
#' estimates. The dimension of \code{Vobs} is same as \code{Icom}.}
#' \item{Fmis}{The fraction of missing information associated with each parameter estimation. }
#' \item{VFmis}{The proportion of increased variance associated with each
#' parameter estimation due to observed data. }
#' \item{Ieigen}{The largest eigen value of \code{Imiss}.}
#' \item{Icom.trans}{The complete data information
#' matrix for the fisher transformed parameters.}
#' \item{Iobs.trans}{The observed data information matrix for the fisher transformed parameters.}
#' \item{Fmis.trans}{The fractions of missing information associated with the
#' fisher transformed parameters.}
#' @seealso \code{eco}, \code{ecoNP}, \code{summary.ecoML}
#' @references Imai, Kosuke, Ying Lu and Aaron Strauss. (2011). \dQuote{eco: R
#' Package for Ecological Inference in 2x2 Tables} Journal of Statistical
#' Software, Vol. 42, No. 5, pp. 1-23.
#'
#' Imai, Kosuke, Ying Lu and Aaron Strauss. (2008). \dQuote{Bayesian and
#' Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data
#' Approach} Political Analysis, Vol. 16, No. 1 (Winter), pp. 41-69.
#' @keywords models
#' @examples
#'
#'
#' ## load the census data
#' data(census)
#'
#' ## NOTE: convergence has not been properly assessed for the following
#' ## examples. See Imai, Lu and Strauss (2006) for more complete analyses.
#' ## In the first example below, in the interest of time, only part of the
#' ## data set is analyzed and the convergence requirement is less stringent
#' ## than the default setting.
#'
#' ## In the second example, the program is arbitrarily halted 100 iterations
#' ## into the simulation, before convergence.
#'
#' ## load the Robinson's census data
#' data(census)
#'
#' ## fit the parametric model with the default model specifications
#' \dontrun{res <- ecoML(Y ~ X, data = census[1:100,], N=census[1:100,3],
#' epsilon=10^(-6), verbose = TRUE)}
#' ## summarize the results
#' \dontrun{summary(res)}
#'
#' ## fit the parametric model with some individual
#' ## level data using the default prior specification
#' surv <- 1:600
#' \dontrun{res1 <- ecoML(Y ~ X, context = TRUE, data = census[-surv,],
#' supplement = census[surv,c(4:5,1)], maxit=100, verbose = TRUE)}
#' ## summarize the results
#' \dontrun{summary(res1)}
#'
#' @export ecoML
ecoML <- function(formula, data = parent.frame(), N=NULL, supplement = NULL,
theta.start = c(0,0,1,1,0), fix.rho = FALSE,
context = FALSE, sem = TRUE, epsilon=10^(-6),
maxit = 1000, loglik = TRUE, hyptest=FALSE, verbose= FALSE) {
## getting X and Y
mf <- match.call()
tt <- terms(formula)
attr(tt, "intercept") <- 0
if (is.matrix(eval.parent(mf$data)))
data <- as.data.frame(data)
X <- model.matrix(tt, data)
Y <- model.response(model.frame(tt, data=data))
#n.var: total number of parameters involved in the estimation
#n.par: number of nonstatic paramters need to estimate through EM
# also need SEM
#ndim: dimension of the multivariate normal distribution
ndim<-2
if (context) ndim<-3
n.var<-2*ndim+ ndim*(ndim-1)/2
n.par<-n.S<-n.var
if (context) {
n.par<-n.var-2
}
r12<-NULL
if (fix.rho)
r12<-theta.start[n.par]
if (!context & fix.rho) n.par<-n.par-1
flag<-as.integer(context)+2*as.integer(fix.rho)+2^2*as.integer(sem)
##checking data
tmp <- checkdata(X, Y, supplement, ndim)
bdd <- ecoBD(formula=formula, data=data)
W1min <- bdd$Wmin[order(tmp$order.old)[1:nrow(tmp$d)],1,1]
W1max <- bdd$Wmax[order(tmp$order.old)[1:nrow(tmp$d)],1,1]
n <- tmp$n.samp+tmp$survey.samp+tmp$samp.X1+tmp$samp.X0
wcol<-ndim
if (context) {
wcol<-wcol-1
}
inSample.length <- wcol*tmp$n.samp
#if NCAR and the user did not provide a theta.start
if (context && (length(theta.start)==5) )
theta.start<-c(0,0,1,1,0,0,0)
## Fitting the model via EM
res <- .C("cEMeco", as.double(tmp$d), as.double(theta.start),
as.integer(tmp$n.samp), as.integer(maxit), as.double(epsilon),
as.integer(tmp$survey.yes), as.integer(tmp$survey.samp),
as.double(tmp$survey.data),
as.integer(tmp$X1type), as.integer(tmp$samp.X1), as.double(tmp$X1.W1),
as.integer(tmp$X0type), as.integer(tmp$samp.X0), as.double(tmp$X0.W2),
as.double(W1min), as.double(W1max),
as.integer(flag),as.integer(verbose),as.integer(loglik),as.integer(hyptest),
optTheta=rep(-1.1,n.var), pdTheta=double(n.var),
S=double(n.S+1),inSample=double(inSample.length),DMmatrix=double(n.par*n.par),
itersUsed=as.integer(0),history=double((maxit+1)*(n.var+1)),
PACKAGE="eco")
##record results from EM
theta.em<-res$pdTheta
theta.fisher<-param.trans(theta.em, transformation="Fisher")
iters.em<-res$itersUsed
mu.log.em <- matrix(rep(NA,iters.em*ndim),ncol=ndim)
sigma.log.em <- matrix(rep(NA,iters.em*ndim),ncol=ndim)
loglike.log.em <- as.double(rep(NA,iters.em))
nrho<-length(theta.em)-2*ndim
rho.fisher.em <- matrix(rep(NA,iters.em*nrho),ncol=nrho)
for(i in 1:iters.em) {
mu.log.em[i,1:ndim]=res$history[(i-1)*(n.var+1)+(1:ndim)]
sigma.log.em[i,1:ndim]=res$history[(i-1)*(n.var+1)+ndim+(1:ndim)]
if (nrho!=0)
rho.fisher.em[i, 1:nrho]=res$history[(i-1)*(n.var+1)+2*ndim+(1:nrho)]
loglike.log.em[i]=res$history[(i-1)*(n.var+1)+2*ndim+nrho+1]
}
## In sample prediction of W
W <- matrix(rep(NA,inSample.length),ncol=wcol)
for (i in 1:tmp$n.samp)
for (j in 1:wcol)
W[i,j]=res$inSample[(i-1)*2+j]
## SEM step
iters.sem<-0
suff.stat<-res$S
if (context)
{
suff.stat<-rep(0,(n.var+1))
suff.stat[1]<-mean(logit(c(X,supplement[,3])))
suff.stat[2:3]<-res$S[1:2]
suff.stat[4]<-mean((logit(c(X, supplement[,3])))^2)
suff.stat[5:6]<-res$S[3:4]
suff.stat[7:8]<-res$S[6:7]
suff.stat[9]<-res$S[5]
suff.stat[10]<-res$S[8]
}
if (sem)
{
DM <- matrix(rep(NA,n.par*n.par),ncol=n.par)
res <- .C("cEMeco", as.double(tmp$d), as.double(theta.start),
as.integer(tmp$n.samp), as.integer(maxit), as.double(epsilon),
as.integer(tmp$survey.yes), as.integer(tmp$survey.samp),
as.double(tmp$survey.data),
as.integer(tmp$X1type), as.integer(tmp$samp.X1), as.double(tmp$X1.W1),
as.integer(tmp$X0type), as.integer(tmp$samp.X0), as.double(tmp$X0.W2),
as.double(bdd$Wmin[,1,1]), as.double(bdd$Wmax[,1,1]),
as.integer(flag),as.integer(verbose),as.integer(loglik),as.integer(hyptest),
res$pdTheta, pdTheta=double(n.var), S=double(n.S+1),
inSample=double(inSample.length),DMmatrix=double(n.par*n.par),
itersUsed=as.integer(0),history=double((maxit+1)*(n.var+1)),
PACKAGE="eco")
iters.sem<-res$itersUsed
for(i in 1:n.par)
for(j in 1:n.par)
DM[i,j]=res$DMmatrix[(i-1)*n.par+j]
}
if (!context) names(theta.em)<-c("u1","u2","s1","s2","r12")
if (context) names(theta.em)<-c("ux","u1","u2","sx","s1","s2","r1x","r2x","r12")
## output
res.out<-list(call = mf, Y = Y, X = X, N = N,
fix.rho = fix.rho, context = context, sem=sem, epsilon=epsilon,
theta.em=theta.em, r12=r12,
sigma.log = theta.fisher[(ndim+1):(2*ndim)], suff.stat = suff.stat[1:n.S],
loglik = res$S[n.S+1], iters.em = iters.em,
iters.sem = iters.sem, mu.log.em = mu.log.em,
sigma.log.em = sigma.log.em,
rho.fisher.em = rho.fisher.em, loglike.log.em = loglike.log.em,
W = W)
if (sem) {
res.out$DM<-DM
#print(dim(data))
# n<-dim(data)[1]
if (!is.null(supplement)) n<-n+dim(supplement)[1]
#cat("n2=", n,"\n")
res.info<- ecoINFO(theta.em=res.out$theta.em, suff.stat=res.out$suff.stat, DM=res.out$DM, context=context, fix.rho=fix.rho, sem=sem, r12=res.out$r12, n=n)
res.out$DM<-res.info$DM
res.out$Icom<-res.info$Icom
res.out$Iobs<-res.info$Iobs
res.out$Fmis<-res.info$Fmis
res.out$Vobs.original<-res.info$Vobs.original
res.out$Vobs<-res.info$Vobs
res.out$Iobs<-res.info$Iobs
res.out$VFmis<-res.info$VFmis
res.out$Icom.trans<-res.info$Icom.trans
res.out$Iobs.trans<-res.info$Iobs.trans
res.out$Fmis.trans<-res.info$Fmis.trans
res.out$Imiss<-res.info$Imiss
res.out$Ieigen<-res.info$Ieigen
res.out$Iobs<-res.info$Iobs
}
class(res.out) <- "ecoML"
return(res.out)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.