Nothing
## sample from the posterior distribution of a one-dimensional item
## response theory model in R using linked C++ code in Scythe.
##
## ADM and KQ 1/23/2003
## updated extensively ADM & KQ 7/28/2004
## store.ability arg added KQ 1/27/2006
## Hierarchical Subject Parameters MJM 2007-11-07
## Parameter Expansion 2008-11-18
## Grouped Subject Parameters (Wdata) started MJM,YS 2008-11
#' Markov Chain Monte Carlo for Hierarchical One Dimensional Item Response
#' Theory Model, Covariates Predicting Latent Ideal Point (Ability)
#'
#' This function generates a sample from the posterior distribution of a one
#' dimensional item response theory (IRT) model, with multivariate Normal
#' priors on the item parameters, and a Normal-Inverse Gamma hierarchical prior
#' on subject ideal points (abilities). The user supplies item-response data,
#' subject covariates, and priors. Note that this identification strategy
#' obviates the constraints used on theta in \code{\link[MCMCpack]{MCMCirt1d}}.
#' A sample from the posterior distribution is returned as an mcmc object,
#' which can be subsequently analyzed with functions provided in the coda
#' package.
#'
#' If you are interested in fitting K-dimensional item response theory models,
#' or would rather identify the model by placing constraints on the item
#' parameters, please see \code{\link[MCMCpack]{MCMCirtKd}}.
#'
#' \code{MCMCirtHier1d} simulates from the posterior distribution using
#' standard Gibbs sampling using data augmentation (a Normal draw for the
#' subject abilities, a multivariate Normal draw for (second-level) subject
#' ability predictors, an Inverse-Gamma draw for the (second-level) variance of
#' subject abilities, a multivariate Normal draw for the item parameters, and a
#' truncated Normal draw for the latent utilities). The simulation proper is
#' done in compiled C++ code to maximize efficiency. Please consult the coda
#' documentation for a comprehensive list of functions that can be used to
#' analyze the posterior sample.
#'
#' The model takes the following form. We assume that each subject has an
#' subject ability (ideal point) denoted \eqn{\theta_j} and that each
#' item has a difficulty parameter \eqn{a_i} and discrimination parameter
#' \eqn{b_i}. The observed choice by subject \eqn{j} on item
#' \eqn{i} is the observed data matrix which is \eqn{(I \times J)}.
#' We assume that the choice is dictated by an unobserved utility:
#'
#' \deqn{z_{i,j} = -\alpha_i + \beta_i \theta_j + \varepsilon_{i,j}}
#'
#' Where the errors are assumed to be distributed standard Normal.
#' This constitutes the measurement or level-1 model. The subject abilities
#' (ideal points) are modeled by a second level Normal linear predictor for
#' subject covariates \code{Xjdata}, with common variance
#' \eqn{\sigma^2}. The parameters of interest are the subject
#' abilities (ideal points), item parameters, and second-level coefficients.
#'
#' We assume the following priors. For the subject abilities (ideal points):
#'
#' \deqn{\theta_j \sim \mathcal{N}(\mu_{\theta} ,T_{0}^{-1})}
#'
#' For the item parameters, the prior is:
#'
#' \deqn{\left[a_i, b_i \right]' \sim \mathcal{N}_2 (ab_{0},AB_{0}^{-1})}
#'
#' The model is identified by the proper priors on the item parameters and
#' constraints placed on the ability parameters.
#'
#' As is the case with all measurement models, make sure that you have plenty
#' of free memory, especially when storing the item parameters.
#'
#' @param datamatrix The matrix of data. Must be 0, 1, or missing values. The
#' rows of \code{datamatrix} correspond to subjects and the columns correspond
#' to items.
#'
#' @param Xjdata A \code{data.frame} containing second-level predictor
#' covariates for ideal points \eqn{\theta}. Predictors are modeled as a
#' linear regression on the mean vector of \eqn{\theta}; the posterior
#' sample contains regression coefficients \eqn{\beta} and common
#' variance \eqn{\sigma^2}. See Rivers (2003) for a thorough
#' discussion of identification of IRT models.
#'
#' @param burnin The number of burn-in iterations for the sampler.
#'
#' @param mcmc The number of Gibbs iterations for the sampler.
#'
#' @param thin The thinning interval used in the simulation. The number of
#' Gibbs iterations must be divisible by this value.
#'
#' @param verbose A switch which determines whether or not the progress of the
#' sampler is printed to the screen. If \code{verbose} is greater than 0 then
#' every \code{verbose}th iteration will be printed to the screen.
#'
#' @param seed The seed for the random number generator. If NA, the Mersenne
#' Twister generator is used with default seed 12345; if an integer is passed
#' it is used to seed the Mersenne twister. The user can also pass a list of
#' length two to use the L'Ecuyer random number generator, which is suitable
#' for parallel computation. The first element of the list is the L'Ecuyer
#' seed, which is a vector of length six or NA (if NA a default seed of
#' \code{rep(12345,6)} is used). The second element of list is a positive
#' substream number. See the MCMCpack specification for more details.
#'
#' @param theta.start The starting values for the subject abilities (ideal
#' points). This can either be a scalar or a column vector with dimension equal
#' to the number of voters. If this takes a scalar value, then that value will
#' serve as the starting value for all of the thetas. The default value of NA
#' will choose the starting values based on an eigenvalue-eigenvector
#' decomposition of the agreement score matrix formed from the
#' \code{datamatrix}.
#'
#' @param a.start The starting values for the \eqn{a} difficulty parameters.
#' This can either be a scalar or a column vector with dimension equal to the
#' number of items. If this takes a scalar value, then that value will serve
#' as the starting value for all \eqn{a}. The default value of NA will set
#' the starting values based on a series of probit regressions that condition
#' on the starting values of theta.
#'
#' @param b.start The starting values for the \eqn{b} discrimination
#' parameters. This can either be a scalar or a column vector with dimension
#' equal to the number of items. If this takes a scalar value, then that value
#' will serve as the starting value for all \eqn{b}. The default value of
#' NA will set the starting values based on a series of probit regressions that
#' condition on the starting values of theta.
#'
#' @param beta.start The starting values for the \eqn{\beta} regression
#' coefficients that predict the means of ideal points \eqn{\theta}.
#' This can either be a scalar or a column vector with length equal to the
#' number of covariates. If this takes a scalar value, then that value will
#' serve as the starting value for all of the betas. The default value of NA
#' will set the starting values based on a linear regression of the covariates
#' on (either provided or generated) \code{theta.start}.
#'
#' @param b0 The prior mean of \eqn{\beta}. Can be either a scalar or a
#' vector of length equal to the number of subject covariates. If a scalar all
#' means with be set to the passed value.
#'
#' @param B0 The prior precision of \eqn{\beta}. This can either be a
#' scalar or a square matrix with dimensions equal to the number of betas. If
#' this takes a scalar value, then that value times an identity matrix serves
#' as the prior precision of beta. A default proper but diffuse value of .01
#' ensures finite marginal likelihood for model comparison. A value of 0 is
#' equivalent to an improper uniform prior for beta.
#'
#' @param c0 \eqn{c_0/2} is the shape parameter for the inverse Gamma
#' prior on \eqn{\sigma^2} (the variance of \eqn{\theta}). The
#' amount of information in the inverse Gamma prior is something
#' like that from \eqn{c_0} pseudo-observations.
#'
#' @param d0 \eqn{d_0/2} is the scale parameter for the inverse Gamma
#' prior on \eqn{\sigma^2} (the variance of \eqn{\theta}). In
#' constructing the inverse Gamma prior, \eqn{d_0} acts like the sum of
#' squared errors from the \eqn{c_0} pseudo-observations.
#'
#' @param ab0 The prior mean of \code{(a, b)}. Can be either a scalar or a
#' 2-vector. If a scalar both means will be set to the passed value. The prior
#' mean is assumed to be the same across all items.
#'
#' @param AB0 The prior precision of \code{(a, b)}.This can either be ascalar
#' or a 2 by 2 matrix. If this takes a scalar value, then that value times an
#' identity matrix serves as the prior precision. The prior precision is
#' assumed to be the same across all items.
#'
#' @param store.item A switch that determines whether or not to store the item
#' parameters for posterior analysis. \emph{NOTE: In situations with many
#' items storing the item parameters takes an enormous amount of memory, so
#' \code{store.item} should only be \code{TRUE} if the chain is thinned
#' heavily, or for applications with a small number of items}. By default, the
#' item parameters are not stored.
#'
#' @param store.ability A switch that determines whether or not to store the
#' ability parameters for posterior analysis. \emph{NOTE: In situations with
#' many individuals storing the ability parameters takes an enormous amount of
#' memory, so \code{store.ability} should only be \code{TRUE} if the chain is
#' thinned heavily, or for applications with a small number of individuals}.
#' By default, ability parameters are stored.
#'
#' @param drop.constant.items A switch that determines whether or not items
#' that have no variation should be deleted before fitting the model. Default =
#' TRUE.
#'
#' @param marginal.likelihood Should the marginal likelihood of the
#' second-level model on ideal points be calculated using the method of Chib
#' (1995)? It is stored as an attribute of the posterior \code{mcmc} object and
#' suitable for comparison using \code{\link[MCMCpack]{BayesFactor}}.
#'
#' @param px Use Parameter Expansion to reduce autocorrelation in the chain?
#' PX introduces an unidentified parameter \eqn{alpha} for the residual
#' variance in the latent data (Liu and Wu 1999). Default = TRUE
#'
#' @param px_a0 Prior shape parameter for the inverse-gamma distribution on
#' \eqn{alpha}, the residual variance of the latent data. Default=10.
#'
#' @param px_b0 Prior scale parameter for the inverse-gamma distribution on
#' \eqn{alpha}, the residual variance of the latent data. Default = 10
#'
#' @param ... further arguments to be passed
#'
#' @return An \code{mcmc} object that contains the sample from the posterior
#' distribution. This object can be summarized by functions provided by the
#' coda package. If \code{marginal.likelihood = "Chib95"} the object will have
#' attribute \code{logmarglike}.
#'
#' @export
#'
#' @author Michael Malecki, \email{mike@@crunch.io},
#' \url{https://github.com/malecki}.
#'
#' @seealso \code{\link[coda]{plot.mcmc}},\code{\link[coda]{summary.mcmc}},
#' \code{\link[MCMCpack]{MCMCirtKd}}
#'
#' @references James H. Albert. 1992. ``Bayesian Estimation of Normal Ogive
#' Item Response Curves Using Gibbs Sampling." \emph{Journal of Educational
#' Statistics}. 17: 251--269.
#'
#' Joshua Clinton, Simon Jackman, and Douglas Rivers. 2004. ``The Statistical
#' Analysis of Roll Call Data." \emph{American Political Science Review} 98:
#' 355--370.
#'
#' Valen E. Johnson and James H. Albert. 1999. ``Ordinal Data Modeling."
#' Springer: New York.
#'
#' Liu, Jun S. and Ying Nian Wu. 1999. ``Parameter Expansion for Data
#' Augmentation.'' \emph{Journal of the American Statistical Association} 94:
#' 1264--1274.
#'
#' Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park. 2011. ``MCMCpack:
#' Markov Chain Monte Carlo in R.'', \emph{Journal of Statistical Software}.
#' 42(9): 1-21. \doi{10.18637/jss.v042.i09}.
#'
#' Daniel Pemstein, Kevin M. Quinn, and Andrew D. Martin. 2007. \emph{Scythe
#' Statistical Library 1.0.} \url{http://scythe.wustl.edu.s3-website-us-east-1.amazonaws.com/}.
#'
#' Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2006. ``Output
#' Analysis and Diagnostics for MCMC (CODA)'', \emph{R News}. 6(1): 7-11.
#' \url{https://CRAN.R-project.org/doc/Rnews/Rnews_2006-1.pdf}.
#'
#' Douglas Rivers. 2004. ``Identification of Multidimensional Item-Response
#' Models." Stanford University, typescript.
#'
#' @keywords models
#'
#' @examples
#'
#' \dontrun{
#' data(SupremeCourt)
#'
#' Xjdata <- data.frame(presparty= c(1,1,0,1,1,1,1,0,0),
#' sex= c(0,0,1,0,0,0,0,1,0))
#'
#' ## Parameter Expansion reduces autocorrelation.
#' posterior1 <- MCMCirtHier1d(t(SupremeCourt),
#' burnin=50000, mcmc=10000, thin=20,
#' verbose=10000,
#' Xjdata=Xjdata,
#' marginal.likelihood="Chib95",
#' px=TRUE)
#'
#' ## But, you can always turn it off.
#' posterior2 <- MCMCirtHier1d(t(SupremeCourt),
#' burnin=50000, mcmc=10000, thin=20,
#' verbose=10000,
#' Xjdata=Xjdata,
#' #marginal.likelihood="Chib95",
#' px=FALSE)
#' ## Note that the hierarchical model has greater autocorrelation than
#' ## the naive IRT model.
#' posterior0 <- MCMCirt1d(t(SupremeCourt),
#' theta.constraints=list(Scalia="+", Ginsburg="-"),
#' B0.alpha=.2, B0.beta=.2,
#' burnin=50000, mcmc=100000, thin=100, verbose=10000,
#' store.item=FALSE)
#'
#' ## Randomly 10% Missing -- this affects the expansion parameter, increasing
#' ## the variance of the (unidentified) latent parameter alpha.
#'
#' scMiss <- SupremeCourt
#' scMiss[matrix(as.logical(rbinom(nrow(SupremeCourt)*ncol(SupremeCourt), 1, .1)),
#' dim(SupremeCourt))] <- NA
#'
#' posterior1.miss <- MCMCirtHier1d(t(scMiss),
#' burnin=80000, mcmc=10000, thin=20,
#' verbose=10000,
#' Xjdata=Xjdata,
#' marginal.likelihood="Chib95",
#' px=TRUE)
#'
#' }
#'
"MCMCirtHier1d" <-
function(datamatrix, Xjdata,
burnin = 1000, mcmc = 20000, thin=1,
verbose = 0, seed = NA,
theta.start = NA, a.start = NA, b.start = NA,
beta.start=NA, b0=0, B0=.01, c0=.001, d0=.001,
ab0=0, AB0=.25, store.item = FALSE, store.ability=TRUE,
drop.constant.items=TRUE,
marginal.likelihood=c("none","Chib95"), px=TRUE,
px_a0 = 10, px_b0=10,
... ) {
## checks
check.offset(list(...))
check.mcmc.parameters(burnin, mcmc, thin)
## check vote matrix and convert to work with C++ code
if (drop.constant.items==TRUE){
x.col.var <- apply(datamatrix, 2, var, na.rm=TRUE)
keep.inds <- x.col.var>0
keep.inds[is.na(keep.inds)] <- FALSE
datamatrix <- datamatrix[,keep.inds]
}
datamatrix <- as.matrix(datamatrix)
K <- ncol(datamatrix) # cases, bills, items, etc
J <- nrow(datamatrix) # justices, legislators, subjects, etc
L <- ncol(Xjdata) # predictors on theta Xj
if(sum(datamatrix==1 | datamatrix==0 | is.na(datamatrix)) != (J*K)) {
cat("Error: Data matrix contains elements other than 0, 1 or NA.\n")
stop("Please check data and call ", calling.function(), " again.\n",
call.=FALSE)
}
datamatrix[is.na(datamatrix)] <- 9
item.names <- colnames(as.data.frame(datamatrix))
subject.names <- rownames(as.data.frame(datamatrix))
beta.names <- c(names(as.data.frame(Xjdata)),"sigmasq")
## names
item.names <- colnames(datamatrix)
if (is.null(item.names)){
item.names <- paste("item", 1:K, sep="")
}
## check Xj matrix and set up betastart
Xjdata <- as.matrix(Xjdata)
if(nrow(Xjdata) != nrow(datamatrix)) {
cat("Error: subject covariates not of same length as datamatrix\n")
stop("Please check data and try ",calling.function()," again.\n",call.=FALSE)
}
## prior for (alpha, beta)
holder <- form.mvn.prior(ab0, AB0, 2)
ab0 <- holder[[1]]
AB0 <- holder[[2]]
## starting values for theta error checking
## could use factor.score.start.check EXCEPT
## We have done away with eq and ineq constraints.
if (max(is.na(theta.start))==1) {
theta.start <- factor.score.eigen.start(agree.mat(datamatrix), 1)
}
else if(is.numeric(theta.start) & length(theta.start) == J ) {
theta.start <- theta.start * matrix(1, J, 1)
}
else {
cat("Inappropriate value of theta.start passed.\n")
stop("Please respecify and call", calling.function(), " again.\n",
call.=FALSE)
}
## starting values for (a, b)
ab.starts <- matrix(NA, K, 2)
for (i in 1:K){
local.y <- datamatrix[,i]
local.y[local.y==9] <- NA
if (var(na.omit(local.y))==0){
ab.starts[i,] <- c(0,10)
}
else {
ab.starts[i,] <- coef(suppressWarnings(glm(local.y~theta.start,
family=binomial(link="probit"),
control=glm.control(
maxit=8, epsilon=1e-3)
)))
}
}
ab.starts[,1] <- -1 * ab.starts[,1] # make this into a difficulty param
## starting values for a and b error checking
if (is.na(a.start)) {
a.start <- ab.starts[,1]
}
else if(is.null(dim(a.start))) {
a.start <- a.start * matrix(1,K,1)
}
else if((dim(a.start)[1] != K) || (dim(a.start)[2] != 1)) {
cat("Error: Starting value for a not conformable.\n")
stop("Please respecify and call ", calling.function(), " again.\n",
call.=FALSE)
}
if (is.na(b.start)) {
b.start <- ab.starts[,2]
}
else if(is.null(dim(b.start))) {
b.start <- b.start * matrix(1,K,1)
}
else if((dim(b.start)[1] != K) || (dim(b.start)[2] != 1)) {
cat("Error: Starting value for b not conformable.\n")
stop("Please respecify and call ", calling.function(), " again.\n",
call.=FALSE)
}
cat("Generating starting values (glm) for hierarchical parameters:\n")
## starting values are regression of theta.start on Xj
## or passed vector, or same value all beta
if (max(is.na(beta.start))==1) { # beta.start NA
beta.start <- coef(suppressWarnings(glm.fit(Xjdata,theta.start)))
} else if ( length(beta.start) == L ) { # beta.start vector
beta.start <- matrix(beta.start,L,1)
} else if ( length(beta.start) == 1 ) { # beta.start scalar
beta.start <- beta.start * matrix(1,L,1)
} else {
cat("Error: Starting value for beta not conformable.\n")
stop("Please respecify and call ", calling.function(), " again.\n",
call.=FALSE)
}
print(beta.start)
## prior for beta
holder <- form.mvn.prior(b0, B0, L)
b0 <- holder[[1]]
B0 <- holder[[2]]
check.ig.prior(c0, d0)
## get marginal likelihood argument
marginal.likelihood <- match.arg(marginal.likelihood)
B0.eigenvalues <- eigen(B0)$values
if (min(B0.eigenvalues) < 0){
stop("B0 is not positive semi-definite.\nPlease respecify and call again.\n")
}
if (isTRUE(all.equal(min(B0.eigenvalues), 0))){
if (marginal.likelihood != "none"){
warning("Cannot calculate marginal likelihood with improper prior\n")
marginal.likelihood <- "none"
}
}
logmarglike <- NULL
chib <- 0
if (marginal.likelihood == "Chib95"){
chib <- 1
}
cat("setting up posterior holder\n" )
## define holder for posterior sample
if(store.item == FALSE & store.ability == TRUE) {
sample <- matrix(data=0, mcmc/thin, J+L+1)
}
else if (store.item == TRUE & store.ability == FALSE){
sample <- matrix(data=0, mcmc/thin, L+1 + 2*K)
}
else if (store.item == TRUE & store.ability == TRUE){
sample <- matrix(data=0, mcmc/thin, L+1 + J + 2 * K)
}
else{
stop("Either store.item or store.ability should be true.\n")
}
## seeds
seeds <- form.seeds(seed)
lecuyer <- seeds[[1]]
seed.array <- seeds[[2]]
lecuyer.stream <- seeds[[3]]
# call C++ code to draw sample
posterior <- .C("cMCMCirtHier1d",
sampledata = as.double(sample),
samplerow = as.integer(nrow(sample)),
samplecol = as.integer(ncol(sample)),
Xdata = as.integer(datamatrix),
Xrow = as.integer(nrow(datamatrix)),
Xcol = as.integer(ncol(datamatrix)),
burnin = as.integer(burnin),
mcmc = as.integer(mcmc),
thin = as.integer(thin),
lecuyer = as.integer(lecuyer),
seedarray = as.integer(seed.array),
lecuyerstream = as.integer(lecuyer.stream),
verbose = as.integer(verbose),
thetastartdata = as.double(theta.start),
thetastartrow = as.integer(nrow(theta.start)),
thetastartcol = as.integer(ncol(theta.start)),
astartdata = as.double(a.start),
astartrow = as.integer(length(a.start)),
astartcol = as.integer(1),
bstartdata = as.double(b.start),
bstartrow = as.integer(length(b.start)),
bstartcol = as.integer(1),
ab0data = as.double(ab0),
ab0row = as.integer(nrow(ab0)),
ab0col = as.integer(ncol(ab0)),
AB0data = as.double(AB0),
AB0row = as.integer(nrow(AB0)),
AB0col = as.integer(ncol(AB0)),
Xjdata = as.double(Xjdata),
Xjrow = as.integer(nrow(Xjdata)),
Xjcol = as.integer(ncol(Xjdata)),
betastartdata = as.double(beta.start),
betastartrow = as.integer(length(beta.start)),
betastartcol = as.integer(1),
b0data = as.double(b0),
b0row = as.integer(length(b0)),
b0col = as.integer(1),
B0data = as.double(B0),
B0row = as.integer(nrow(B0)),
B0col = as.integer(ncol(B0)),
c0 = as.double(c0),
d0 = as.double(d0),
storei = as.integer(store.item),
storea = as.integer(store.ability),
logmarglikeholder = as.double(0.0),
chib = as.integer(chib),
px= as.integer(px),
px_a0 = as.double(px_a0),
px_b0 = as.double(px_b0),
PACKAGE="MCMCpack"
)
beta.names <- paste("beta.",beta.names,sep="")
theta.names <- paste("theta.", subject.names, sep = "")
alpha.beta.names <- paste(rep(c("a.","b."), K),
rep(item.names, each = 2),
sep = "")
# put together matrix and build MCMC object to return
sample <- matrix(posterior$sampledata, posterior$samplerow,
posterior$samplecol,
byrow=FALSE)
output <- mcmc(data=sample, start=burnin+1, end=burnin+mcmc, thin=thin)
if (marginal.likelihood == "Chib95"){
logmarglike <- posterior$logmarglikeholder
}
names <- NULL
if(store.ability == TRUE) {
names <- c(names, theta.names)
}
if (store.item == TRUE){
names <- c(names, alpha.beta.names)
}
names <- c(names,beta.names)
try( varnames(output) <- names)
attr(output,"title") <-
"MCMCirtHier1d Posterior Sample"
attr(output,"logmarglike") <- posterior$logmarglikeholder
return(output)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.