MCMCpaircompare2dDP | R Documentation |
This function generates a sample from the posterior distribution of a model for pairwise comparisons data with a probit link. Unlike standard models for pairwise comparisons data, in this model the latent attribute of each item being compared is a vector in two-dimensional Euclidean space.
MCMCpaircompare2dDP(
pwc.data,
theta.constraints = list(),
burnin = 1000,
mcmc = 20000,
thin = 1,
verbose = 0,
seed = NA,
gamma.start = NA,
theta.start = NA,
store.theta = TRUE,
store.gamma = FALSE,
tune = 0.3,
procrustes = FALSE,
alpha.start = 1,
cluster.max = 100,
cluster.mcmc = 500,
alpha.fixed = TRUE,
a0 = 1,
b0 = 1,
...
)
pwc.data |
A data.frame containing the pairwise comparisons data.
Each row of |
theta.constraints |
A list specifying possible simple equality or
inequality constraints on the item parameters. A
typical entry in the list has one of three forms:
|
burnin |
The number of burn-in iterations for the sampler. |
mcmc |
The number of Gibbs iterations for the sampler. |
thin |
The thinning interval used in the simulation. The number of Gibbs iterations must be divisible by this value. |
verbose |
A switch which determines whether or not the progress of the
sampler is printed to the screen. If |
seed |
The seed for the random number generator. If NA, the Mersenne
Twister generator is used with default seed 12345; if an integer is passed
it is used to seed the Mersenne twister. The user can also pass a list of
length two to use the L'Ecuyer random number generator, which is suitable
for parallel computation. The first element of the list is the L'Ecuyer
seed, which is a vector of length six or NA (if NA a default seed of
|
gamma.start |
The starting value for the gamma vector. This
can either be a scalar or a column vector with dimension equal to the number
of raters. If this takes a scalar value, then that value will serve as the
starting value for all of the gammas. The default value of NA will set the
starting value of each gamma parameter to |
theta.start |
Starting values for the theta. Can be either a numeric scalar, a J by 2 matrix (where J is the number of items compared), or NA. If a scalar, all theta values are set to that value (except elements already specified via theta.contraints. If NA, then non constrained elements of theta are set equal to 0, elements constrained to be positive are set equal to 0.5, elements constrained to be negative are set equal to -0.5 and elements with equality constraints are set to satisfy those constraints. |
store.theta |
Should the theta draws be returned? Default is TRUE. |
store.gamma |
Should the gamma draws be returned? Default is TRUE. |
tune |
Tuning parameter for the random walk Metropolis proposal for
each gamma_i. |
procrustes |
Should the theta and gamma draws be post-processed with
a Procrustes transformation? Default is FALSE. The Procrustes target matrix
is derived from the constrained elements of theta. Each row of theta that
has both theta values constrained is part of the of the target matrix.
Elements with equality constraints are set to those values. Elements
constrained to be positive are set to 1. Elements constrained to be negative
are set to -1. If |
alpha.start |
The starting value for the DP concentration parameter
alpha. Must be a positive scalar. Defaults to 1. If |
cluster.max |
The maximum number of clusters allowed in the approximation to the DP prior for gamma. Defaults to 100. Must be a positive integer. |
cluster.mcmc |
The number of additional MCMC iterations that are done to sample each cluster-specific gamma value within one main MCMC iteration. Must be a positive integer. Defaults to 500. Setting this to a lower value speeds runtime at the cost of (possibly) worse mixing. |
alpha.fixed |
Logical value indicating whether the DP concentration
parameter alpha be held fixed ( |
a0 |
The shape parameter of the gamma prior for alpha. This is the
same parameterization of the gamma distribution as R's internal
|
b0 |
The rate parameter of the gamma prior for alpha. This is the
same parameterization of the gamma distribution as R's internal
|
... |
further arguments to be passed |
MCMCpaircompare2d
uses the data augmentation approach of Albert and
Chib (1993) in conjunction with Gibbs and Metropolis-within-Gibbs steps
to fit the model. The user supplies data and a sample from the
posterior is returned as an mcmc
object, which can be subsequently
analyzed in the coda
package.
The simulation is done in compiled C++ code to maximize efficiency.
Please consult the coda
package documentation for a comprehensive
list of functions that can be used to analyze the posterior sample.
The model takes the following form:
i = 1,...,I \ \ \ \ (raters)
j = 1,...,J \ \ \ \ (items)
Y_{ijj'} = 1 \ \ if \ \ i \ \ chooses \ \ j \ \ over \ \ j'
Y_{ijj'} = 0 \ \ if \ \ i \ \ chooses \ \ j' \ \ over \ \ j
Y_{ijj'} = NA \ \ if \ \ i \ \ chooses \ \ neither
\Pr(Y_{ijj'} = 1) = \Phi( \mathbf{z}_{i}' [\boldsymbol{\theta}_{j} -
\boldsymbol{\theta}_{ j'} ])
\mathbf{z}_{i}=[\cos(\gamma_{i}), \ \sin(\gamma_{i})]'
The following priors are assumed:
\gamma_i \sim G
G \sim \mathcal{DP}(\alpha G_0)
G_0 = \mathcal{U}nif(0, \pi/2)
\alpha \sim \mathcal{G}amma(a_0, b_0)
\boldsymbol{\theta}_j \sim
\mathcal{N}_{2}(\mathbf{0}, \mathbf{I}_{2})
For identification, some \boldsymbol{\theta}_j
s are truncated
above or below 0, or fixed to constants.
An mcmc object that contains the posterior sample. This object can be summarized by functions provided by the coda package. Most of the column names of the mcmc object are self explanatory. Note however that the columns with names of the form "cluster.[raterID]" give the cluster membership of each rater at each stored MCMC iteration. Because of the possibility of label switching, the particular values of these cluster membership variables are not meaningful. What is meaningful is whether two raters share the same cluster membership value at a particular MCMC iteration. This indicates that those two raters were clustered together during that iteration. Finally, note that the "n.clusters" column gives the number of distinct gamma values at each iteration, i.e. the number of clusters at that iteration.
Qiushi Yu <yuqiushi@umich.edu> and Kevin M. Quinn <kmq@umich.edu>
Albert, J. H. and S. Chib. 1993. “Bayesian Analysis of Binary and Polychotomous Response Data.” J. Amer. Statist. Assoc. 88, 669-679
Yu, Qiushi and Kevin M. Quinn. 2021. “A Multidimensional Pairwise Comparison Model for Heterogeneous Perceptions with an Application to Modeling the Perceived Truthfulness of Public Statements on COVID-19.” University of Michigan Working Paper.
Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park. 2011. “MCMCpack: Markov Chain Monte Carlo in R.”, Journal of Statistical Software. 42(9): 1-21. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v042.i09")}.
Daniel Pemstein, Kevin M. Quinn, and Andrew D. Martin. 2007. Scythe Statistical Library 1.0. http://scythe.wustl.edu.s3-website-us-east-1.amazonaws.com/.
Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2006. “Output Analysis and Diagnostics for MCMC (CODA)”, R News. 6(1): 7-11. https://CRAN.R-project.org/doc/Rnews/Rnews_2006-1.pdf.
plot.mcmc
,summary.mcmc
,
MCMCpaircompare
,
MCMCpaircompare2dDP
## Not run:
## a synthetic data example
set.seed(123)
I <- 65 ## number of raters
J <- 50 ## number of items to be compared
## 3 clusters:
## raters 1 to 5 put most weight on dimension 1
## raters 6 to 10 put most weight on dimension 2
## raters 11 to I put substantial weight on both dimensions
gamma.true <- c(rep(0.05, 5),
rep(1.50, 5),
rep(0.7, I-10) )
theta1.true <- rnorm(J, m=0, s=1)
theta2.true <- rnorm(J, m=0, s=1)
theta1.true[1] <- 2
theta2.true[1] <- 2
theta1.true[2] <- -2
theta2.true[2] <- -2
theta1.true[3] <- 2
theta2.true[3] <- -2
n.comparisons <- 125 ## number of pairwise comparisons for each rater
## generate synthetic data according to the assumed model
rater.id <- NULL
item.1.id <- NULL
item.2.id <- NULL
choice.id <- NULL
for (i in 1:I){
for (c in 1:n.comparisons){
rater.id <- c(rater.id, i+100)
item.numbers <- sample(1:J, size=2, replace=FALSE)
item.1 <- item.numbers[1]
item.2 <- item.numbers[2]
item.1.id <- c(item.1.id, item.1)
item.2.id <- c(item.2.id, item.2)
z <- c(cos(gamma.true[i]), sin(gamma.true[i]))
eta <- z[1] * (theta1.true[item.1] - theta1.true[item.2]) +
z[2] * (theta2.true[item.1] - theta2.true[item.2])
prob.item.1.chosen <- pnorm(eta)
u <- runif(1)
if (u <= prob.item.1.chosen){
choice.id <- c(choice.id, item.1)
}
else{
choice.id <- c(choice.id, item.2)
}
}
}
item.1.id <- paste("item", item.1.id+100, sep=".")
item.2.id <- paste("item", item.2.id+100, sep=".")
choice.id <- paste("item", choice.id+100, sep=".")
sim.data <- data.frame(rater.id, item.1.id, item.2.id, choice.id)
## fit the model (should be run for more than 10500 iterations)
posterior <- MCMCpaircompare2dDP(pwc.data=sim.data,
theta.constraints=list(item.101=list(1,2),
item.101=list(2,2),
item.102=list(1,-2),
item.102=list(2,-2),
item.103=list(1,"+"),
item.103=list(2,"-")),
verbose=100,
burnin=500, mcmc=10000, thin=5,
cluster.mcmc=10,
store.theta=TRUE, store.gamma=TRUE,
tune=0.1)
theta1.draws <- posterior[, grep("theta1", colnames(posterior))]
theta2.draws <- posterior[, grep("theta2", colnames(posterior))]
gamma.draws <- posterior[, grep("gamma", colnames(posterior))]
theta1.post.med <- apply(theta1.draws, 2, median)
theta2.post.med <- apply(theta2.draws, 2, median)
gamma.post.med <- apply(gamma.draws, 2, median)
theta1.post.025 <- apply(theta1.draws, 2, quantile, prob=0.025)
theta1.post.975 <- apply(theta1.draws, 2, quantile, prob=0.975)
theta2.post.025 <- apply(theta2.draws, 2, quantile, prob=0.025)
theta2.post.975 <- apply(theta2.draws, 2, quantile, prob=0.975)
gamma.post.025 <- apply(gamma.draws, 2, quantile, prob=0.025)
gamma.post.975 <- apply(gamma.draws, 2, quantile, prob=0.975)
## compare estimates to truth
par(mfrow=c(2,2))
plot(theta1.true, theta1.post.med, xlim=c(-2.5, 2.5), ylim=c(-2.5, 2.5),
col=rgb(0,0,0,0.3))
segments(x0=theta1.true, x1=theta1.true,
y0=theta1.post.025, y1=theta1.post.975,
col=rgb(0,0,0,0.3))
abline(0, 1, col=rgb(1,0,0,0.5))
plot(theta2.true, theta2.post.med, xlim=c(-2.5, 2.5), ylim=c(-2.5, 2.5),
col=rgb(0,0,0,0.3))
segments(x0=theta2.true, x1=theta2.true,
y0=theta2.post.025, y1=theta2.post.975,
col=rgb(0,0,0,0.3))
abline(0, 1, col=rgb(1,0,0,0.5))
plot(gamma.true, gamma.post.med, xlim=c(0, 1.6), ylim=c(0, 1.6),
col=rgb(0,0,0,0.3))
segments(x0=gamma.true, x1=gamma.true,
y0=gamma.post.025, y1=gamma.post.975,
col=rgb(0,0,0,0.3))
abline(0, 1, col=rgb(1,0,0,0.5))
## plot point estimates
plot(theta1.post.med, theta2.post.med,
xlim=c(-2.5, 2.5), ylim=c(-2.5, 2.5),
col=rgb(0,0,0,0.3))
for (i in 1:length(gamma.post.med)){
arrows(x0=0, y0=0,
x1=cos(gamma.post.med[i]),
y1=sin(gamma.post.med[i]),
col=rgb(1,0,0,0.2), len=0.05, lwd=0.5)
}
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.