Nothing
#' Fitting the Nonparametric Bayesian Models of Ecological Inference in 2x2
#' Tables
#'
#' \code{ecoNP} is used to fit the nonparametric Bayesian model (based on a
#' Dirichlet process prior) for ecological inference in \eqn{2 \times 2} tables
#' via Markov chain Monte Carlo. It gives the in-sample predictions as well as
#' out-of-sample predictions for population inference. The models and
#' algorithms are described in Imai, Lu and Strauss (2008, 2011).
#'
#'
#' @param formula A symbolic description of the model to be fit, specifying the
#' column and row margins of \eqn{2 \times 2} ecological tables. \code{Y ~ X}
#' specifies \code{Y} as the column margin (e.g., turnout) and \code{X} as the
#' row margin (e.g., percent African-American). Details and specific examples
#' are given below.
#' @param data An optional data frame in which to interpret the variables in
#' \code{formula}. The default is the environment in which \code{ecoNP} is
#' called.
#' @param N An optional variable representing the size of the unit; e.g., the
#' total number of voters. \code{N} needs to be a vector of same length as
#' \code{Y} and \code{X} or a scalar.
#' @param supplement An optional matrix of supplemental data. The matrix has
#' two columns, which contain additional individual-level data such as survey
#' data for \eqn{W_1} and \eqn{W_2}, respectively. If \code{NULL}, no
#' additional individual-level data are included in the model. The default is
#' \code{NULL}.
#' @param context Logical. If \code{TRUE}, the contextual effect is also
#' modeled, that is to assume the row margin \eqn{X} and the unknown \eqn{W_1}
#' and \eqn{W_2} are correlated. See Imai, Lu and Strauss (2008, 2011) for
#' details. The default is \code{FALSE}.
#' @param mu0 A scalar or a numeric vector that specifies the prior mean for
#' the mean parameter \eqn{\mu} of the base prior distribution \eqn{G_0} (see
#' Imai, Lu and Strauss (2008, 2011) for detailed descriptions of Dirichlete
#' prior and the normal base prior distribution) . If it is a scalar, then its
#' value will be repeated to yield a vector of the length of \eqn{\mu},
#' otherwise, it needs to be a vector of same length as \eqn{\mu}. When
#' \code{context=TRUE }, the length of \eqn{\mu} is 3, otherwise it is 2. The
#' default is \code{0}.
#' @param tau0 A positive integer representing the scale parameter of the
#' Normal-Inverse Wishart prior for the mean and variance parameter
#' \eqn{(\mu_i, \Sigma_i)} of each observation. The default is \code{2}.
#' @param nu0 A positive integer representing the prior degrees of freedom of
#' the variance matrix \eqn{\Sigma_i}. the default is \code{4}.
#' @param S0 A positive scalar or a positive definite matrix that specifies the
#' prior scale matrix for the variance matrix \eqn{\Sigma_i}. If it is a
#' scalar, then the prior scale matrix will be a diagonal matrix with the same
#' dimensions as \eqn{\Sigma_i} and the diagonal elements all take value of
#' \code{S0}, otherwise \code{S0} needs to have same dimensions as
#' \eqn{\Sigma_i}. When \code{context=TRUE}, \eqn{\Sigma} is a \eqn{3 \times 3}
#' matrix, otherwise, it is \eqn{2 \times 2}. The default is \code{10}.
#' @param alpha A positive scalar representing a user-specified fixed value of
#' the concentration parameter, \eqn{\alpha}. If \code{NULL}, \eqn{\alpha} will
#' be updated at each Gibbs draw, and its prior parameters \code{a0} and
#' \code{b0} need to be specified. The default is \code{NULL}.
#' @param a0 A positive integer representing the value of shape parameter of
#' the gamma prior distribution for \eqn{\alpha}. The default is \code{1}.
#' @param b0 A positive integer representing the value of the scale parameter
#' of the gamma prior distribution for \eqn{\alpha}. The default is \code{0.1}.
#' @param parameter Logical. If \code{TRUE}, the Gibbs draws of the population
#' parameters, \eqn{\mu} and \eqn{\Sigma}, are returned in addition to the
#' in-sample predictions of the missing internal cells, \eqn{W}. The default is
#' \code{FALSE}. This needs to be set to \code{TRUE} if one wishes to make
#' population inferences through \code{predict.eco}. See an example below.
#' @param grid Logical. If \code{TRUE}, the grid method is used to sample
#' \eqn{W} in the Gibbs sampler. If \code{FALSE}, the Metropolis algorithm is
#' used where candidate draws are sampled from the uniform distribution on the
#' tomography line for each unit. Note that the grid method is significantly
#' slower than the Metropolis algorithm.
#' @param n.draws A positive integer. The number of MCMC draws. The default is
#' \code{5000}.
#' @param burnin A positive integer. The burnin interval for the Markov chain;
#' i.e. the number of initial draws that should not be stored. The default is
#' \code{0}.
#' @param thin A positive integer. The thinning interval for the Markov chain;
#' i.e. the number of Gibbs draws between the recorded values that are skipped.
#' The default is \code{0}.
#' @param verbose Logical. If \code{TRUE}, the progress of the Gibbs sampler is
#' printed to the screen. The default is \code{FALSE}.
#' @return An object of class \code{ecoNP} containing the following elements:
#' \item{call}{The matched call.}
#' \item{X}{The row margin, \eqn{X}.}
#' \item{Y}{The column margin, \eqn{Y}.}
#' \item{burnin}{The number of initial burnin draws.}
#' \item{thin}{The thinning interval.}
#' \item{nu0}{The prior degrees of freedom.}
#' \item{tau0}{The prior scale parameter.}
#' \item{mu0}{The prior mean.}
#' \item{S0}{The prior scale matrix.}
#' \item{a0}{The prior shape parameter.}
#' \item{b0}{The prior scale parameter.}
#' \item{W}{A three dimensional array storing the posterior in-sample predictions
#' of \eqn{W}. The first dimension indexes the Monte Carlo draws, the second dimension
#' indexes the columns of the table, and the third dimension represents the observations.}
#' \item{Wmin}{A numeric matrix storing the lower bounds of \eqn{W}.}
#' \item{Wmax}{A numeric matrix storing the upper bounds of \eqn{W}.}
#' The following additional elements are included in the output when
#' \code{parameter = TRUE}.
#' \item{mu}{A three dimensional array storing the
#' posterior draws of the population mean parameter, \eqn{\mu}. The first
#' dimension indexes the Monte Carlo draws, the second dimension indexes the
#' columns of the table, and the third dimension represents the observations.}
#' \item{Sigma}{A three dimensional array storing the posterior draws of the
#' population variance matrix, \eqn{\Sigma}. The first dimension indexes the
#' Monte Carlo draws, the second dimension indexes the parameters, and the
#' third dimension represents the observations. }
#' \item{alpha}{The posterior draws of \eqn{\alpha}.}
#' \item{nstar}{The number of clusters at each Gibbs draw.}
#' @seealso \code{eco}, \code{ecoML}, \code{predict.eco}, \code{summary.ecoNP}
#' @references Imai, Kosuke, Ying Lu and Aaron Strauss. (2011). \dQuote{eco: R
#' Package for Ecological Inference in 2x2 Tables} Journal of Statistical
#' Software, Vol. 42, No. 5, pp. 1-23.
#'
#' Imai, Kosuke, Ying Lu and Aaron Strauss. (2008). \dQuote{Bayesian and
#' Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data
#' Approach} Political Analysis, Vol. 16, No. 1 (Winter), pp. 41-69.
#' @keywords models
#' @examples
#'
#'
#' ## load the registration data
#' data(reg)
#'
#' ## NOTE: We set the number of MCMC draws to be a very small number in
#' ## the following examples; i.e., convergence has not been properly
#' ## assessed. See Imai, Lu and Strauss (2006) for more complete examples.
#'
#' ## fit the nonparametric model to give in-sample predictions
#' ## store the parameters to make population inference later
#' \dontrun{res <- ecoNP(Y ~ X, data = reg, n.draws = 50, param = TRUE, verbose = TRUE)
#'
#' ##summarize the results
#' summary(res)
#'
#' ## obtain out-of-sample prediction
#' out <- predict(res, verbose = TRUE)
#'
#' ## summarize the results
#' summary(out)
#'
#' ## density plots of the out-of-sample predictions
#' par(mfrow=c(2,1))
#' plot(density(out[,1]), main = "W1")
#' plot(density(out[,2]), main = "W2")
#'
#'
#' ## load the Robinson's census data
#' data(census)
#'
#' ## fit the parametric model with contextual effects and N
#' ## using the default prior specification
#'
#' res1 <- ecoNP(Y ~ X, N = N, context = TRUE, param = TRUE, data = census,
#' n.draws = 25, verbose = TRUE)
#'
#' ## summarize the results
#' summary(res1)
#'
#' ## out-of sample prediction
#' pres1 <- predict(res1)
#' summary(pres1)}
#'
#' @export ecoNP
ecoNP <- function(formula, data = parent.frame(), N = NULL, supplement = NULL,
context = FALSE, mu0 = 0, tau0 = 2, nu0 = 4, S0 = 10,
alpha = NULL, a0 = 1, b0 = 0.1, parameter = FALSE,
grid = FALSE, n.draws = 5000, burnin = 0, thin = 0,
verbose = FALSE){
## contextual effects
if (context)
ndim <- 3
else
ndim <- 2
## checking inputs
if (burnin >= n.draws)
stop("n.draws should be larger than burnin")
if (length(mu0)==1)
mu0 <- rep(mu0, ndim)
else if (length(mu0)!=ndim)
stop("invalid inputs for mu0")
if (is.matrix(S0)) {
if (any(dim(S0)!=ndim))
stop("invalid inputs for S0")
}
else
S0 <- diag(S0, ndim)
mf <- match.call()
## getting X, Y and N
tt <- terms(formula)
attr(tt, "intercept") <- 0
if (is.matrix(eval.parent(mf$data)))
data <- as.data.frame(data)
X <- model.matrix(tt, data)
Y <- model.response(model.frame(tt, data = data))
N <- eval(mf$N, data)
## alpha
if (is.null(alpha)) {
alpha.update <- TRUE
alpha <- 0
}
else
alpha.update <- FALSE
## checking the data and calculating the bounds
tmp <- checkdata(X, Y, supplement, ndim)
bdd <- ecoBD(formula, data=data)
W1min <- bdd$Wmin[order(tmp$order.old)[1:nrow(tmp$d)],1,1]
W1max <- bdd$Wmax[order(tmp$order.old)[1:nrow(tmp$d)],1,1]
## fitting the model
n.store <- floor((n.draws-burnin)/(thin+1))
unit.par <- unit.w <- tmp$n.samp+tmp$samp.X1+tmp$samp.X0
n.par <- n.store * unit.par
n.w <- n.store * unit.w
unit.a <- 1
if (context)
res <- .C("cDPecoX", as.double(tmp$d), as.integer(tmp$n.samp),
as.integer(n.draws), as.integer(burnin), as.integer(thin+1),
as.integer(verbose), as.integer(nu0), as.double(tau0),
as.double(mu0), as.double(S0), as.double(alpha),
as.integer(alpha.update), as.double(a0), as.double(b0),
as.integer(tmp$survey.yes), as.integer(tmp$survey.samp),
as.double(tmp$survey.data), as.integer(tmp$X1type),
as.integer(tmp$samp.X1), as.double(tmp$X1.W1),
as.integer(tmp$X0type), as.integer(tmp$samp.X0),
as.double(tmp$X0.W2),
as.double(W1min), as.double(W1max),
as.integer(parameter), as.integer(grid),
pdSMu0=double(n.par), pdSMu1=double(n.par),
pdSMu2=double(n.par),
pdSSig00=double(n.par), pdSSig01=double(n.par),
pdSSig02=double(n.par), pdSSig11=double(n.par),
pdSSig12=double(n.par), pdSSig22=double(n.par),
pdSW1=double(n.w), pdSW2=double(n.w),
pdSa=double(n.store), pdSn=integer(n.store), PACKAGE="eco")
else
res <- .C("cDPeco", as.double(tmp$d), as.integer(tmp$n.samp),
as.integer(n.draws), as.integer(burnin), as.integer(thin+1),
as.integer(verbose), as.integer(nu0), as.double(tau0),
as.double(mu0), as.double(S0), as.double(alpha),
as.integer(alpha.update), as.double(a0), as.double(b0),
as.integer(tmp$survey.yes), as.integer(tmp$survey.samp),
as.double(tmp$survey.data), as.integer(tmp$X1type),
as.integer(tmp$samp.X1), as.double(tmp$X1.W1),
as.integer(tmp$X0type), as.integer(tmp$samp.X0),
as.double(tmp$X0.W2),
as.double(W1min), as.double(W1max),
as.integer(parameter), as.integer(grid),
pdSMu0=double(n.par), pdSMu1=double(n.par),
pdSSig00=double(n.par), pdSSig01=double(n.par),
pdSSig11=double(n.par), pdSW1=double(n.w), pdSW2=double(n.w),
pdSa=double(n.store), pdSn=integer(n.store), PACKAGE="eco")
## output
W1.post <- matrix(res$pdSW1, n.store, unit.w, byrow=TRUE)[,tmp$order.old]
W2.post <- matrix(res$pdSW2, n.store, unit.w, byrow=TRUE)[,tmp$order.old]
W <- array(rbind(W1.post, W2.post), c(n.store, 2, unit.w))
colnames(W) <- c("W1", "W2")
res.out <- list(call = mf, X = X, Y = Y, N = N, W = W,
Wmin = bdd$Wmin[,1,], Wmax = bdd$Wmax[,1,],
burin = burnin, thin = thin, nu0 = nu0, tau0 = tau0,
mu0 = mu0, a0 = a0, b0 = b0, S0 = S0)
## optional outputs
if (parameter){
if (context) {
mu1.post <- matrix(res$pdSMu0, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
mu2.post <- matrix(res$pdSMu1, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
mu3.post <- matrix(res$pdSMu2, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
Sigma11.post <- matrix(res$pdSSig00, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
Sigma12.post <- matrix(res$pdSSig01, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
Sigma13.post <- matrix(res$pdSSig02, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
Sigma23.post <- matrix(res$pdSSig12, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
Sigma22.post <- matrix(res$pdSSig11, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
Sigma33.post <- matrix(res$pdSSig22, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
res.out$mu <- array(rbind(mu1.post, mu2.post, mu3.post),
dim=c(n.store, 3, unit.par),
dimnames=list(1:n.store, c("mu1", "mu2", "mu3"), 1:unit.par))
res.out$Sigma <- array(rbind(Sigma11.post, Sigma12.post, Sigma13.post,
Sigma22.post, Sigma23.post, Sigma33.post),
dim=c(n.store, 6, unit.par),
dimnames=list(1:n.store, c("Sigma11",
"Sigma12", "Sigma13", "Sigma22", "Sigma23", "Sigma33"), 1:unit.par))
}
else {
mu1.post <- matrix(res$pdSMu0, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
mu2.post <- matrix(res$pdSMu1, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
Sigma11.post <- matrix(res$pdSSig00, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
Sigma12.post <- matrix(res$pdSSig01, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
Sigma22.post <- matrix(res$pdSSig11, n.store, unit.par, byrow=TRUE)[,tmp$order.old]
res.out$mu <- array(rbind(mu1.post, mu2.post), dim=c(n.store, 2, unit.par),
dimnames=list(1:n.store, c("mu1", "mu2"), 1:unit.par))
res.out$Sigma <- array(rbind(Sigma11.post, Sigma12.post, Sigma22.post),
dim=c(n.store, 3, unit.par),
dimnames=list(1:n.store, c("Sigma11", "Sigma12", "Sigma22"), 1:unit.par))
}
if (alpha.update)
res.out$alpha <- matrix(res$pdSa, n.store, unit.a, byrow=TRUE)
else
res.out$alpha <- alpha
res.out$nstar <- matrix(res$pdSn, n.store, unit.a, byrow=TRUE)
}
if (context)
class(res.out) <- c("ecoNPX", "ecoNP", "eco")
else
class(res.out) <- c("ecoNP", "eco")
return(res.out)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.