MCMCpaircompare2d: Markov Chain Monte Carlo for the Two-Dimensional Pairwise...

MCMCpaircompare2dR Documentation

Markov Chain Monte Carlo for the Two-Dimensional Pairwise Comparisons Model in Yu and Quinn (2021)

Description

This function generates a sample from the posterior distribution of a model for pairwise comparisons data with a probit link. Unlike standard models for pairwise comparisons data, in this model the latent attribute of each item being compared is a vector in two-dimensional Euclidean space.

Usage

MCMCpaircompare2d(
  pwc.data,
  theta.constraints = list(),
  burnin = 1000,
  mcmc = 20000,
  thin = 1,
  verbose = 0,
  seed = NA,
  gamma.start = NA,
  theta.start = NA,
  store.theta = TRUE,
  store.gamma = TRUE,
  tune = 0.3,
  procrustes = FALSE,
  ...
)

Arguments

pwc.data

A data.frame containing the pairwise comparisons data. Each row of pwc.data corresponds to a single pairwise comparison. pwc.data needs to have exactly four columns. The first column contains a unique identifier for the rater. Column two contains the unique identifier for the first item being compared. Column three contains the unique identifier for the second item being compared. Column four contains the unique identifier of the item selected from the two items being compared. If a tie occurred, the entry in the fourth column should be NA. The identifiers in columns 2 through 4 must start with a letter. Examples are provided below.

theta.constraints

A list specifying possible simple equality or inequality constraints on the item parameters. A typical entry in the list has one of three forms: itemname=list(d,c) which will constrain the dth dimension of theta for the item named itemname to be equal to c, itemname=list(d,"+") which will constrain the dth dimension of theta for the item named itemname to be positive, and itemname=list(d, "-") which will constrain the dth dimension of theta for the item named itemname to be negative.

burnin

The number of burn-in iterations for the sampler.

mcmc

The number of Gibbs iterations for the sampler.

thin

The thinning interval used in the simulation. The number of Gibbs iterations must be divisible by this value.

verbose

A switch which determines whether or not the progress of the sampler is printed to the screen. If verbose is greater than 0 output is printed to the screen every verboseth iteration.

seed

The seed for the random number generator. If NA, the Mersenne Twister generator is used with default seed 12345; if an integer is passed it is used to seed the Mersenne twister. The user can also pass a list of length two to use the L'Ecuyer random number generator, which is suitable for parallel computation. The first element of the list is the L'Ecuyer seed, which is a vector of length six or NA (if NA a default seed of rep(12345,6) is used). The second element of list is a positive substream number. See the MCMCpack specification for more details.

gamma.start

The starting value for the gamma vector. This can either be a scalar or a column vector with dimension equal to the number of raters. If this takes a scalar value, then that value will serve as the starting value for all of the gammas. The default value of NA will set the starting value of each gamma parameter to \pi/4.

theta.start

Starting values for the theta. Can be either a numeric scalar, a J by 2 matrix (where J is the number of items compared), or NA. If a scalar, all theta values are set to that value (except elements already specified via theta.contraints. If NA, then non constrained elements of theta are set equal to 0, elements constrained to be positive are set equal to 0.5, elements constrained to be negative are set equal to -0.5 and elements with equality constraints are set to satisfy those constraints.

store.theta

Should the theta draws be returned? Default is TRUE.

store.gamma

Should the gamma draws be returned? Default is TRUE.

tune

Tuning parameter for the random walk Metropolis proposal for each gamma_i. tune is the width of the uniform proposal centered at the current value of gamma_i. Must be a positive scalar.

procrustes

Should the theta and gamma draws be post-processed with a Procrustes transformation? Default is FALSE. The Procrustes target matrix is derived from the constrained elements of theta. Each row of theta that has both theta values constrained is part of the of the target matrix. Elements with equality constraints are set to those values. Elements constrained to be positive are set to 1. Elements constrained to be negative are set to -1. If procrustes is set to TRUE theta.constraints must be set so that there are at least three rows of theta that have both elements of theta constrained.

...

further arguments to be passed

Details

MCMCpaircompare2d uses the data augmentation approach of Albert and Chib (1993) in conjunction with Gibbs and Metropolis-within-Gibbs steps to fit the model. The user supplies data and a sample from the posterior is returned as an mcmc object, which can be subsequently analyzed in the coda package.

The simulation is done in compiled C++ code to maximize efficiency.

Please consult the coda package documentation for a comprehensive list of functions that can be used to analyze the posterior sample.

The model takes the following form:

i = 1,...,I \ \ \ \ (raters)

j = 1,...,J \ \ \ \ (items)

Y_{ijj'} = 1 \ \ if \ \ i \ \ chooses \ \ j \ \ over \ \ j'

Y_{ijj'} = 0 \ \ if \ \ i \ \ chooses \ \ j' \ \ over \ \ j

Y_{ijj'} = NA \ \ if \ \ i \ \ chooses \ \ neither

\Pr(Y_{ijj'} = 1) = \Phi( \mathbf{z}_{i}' [\boldsymbol{\theta}_{j} - \boldsymbol{\theta}_{ j'} ])

\mathbf{z}_{i}=[\cos(\gamma_{i}), \ \sin(\gamma_{i})]'

The following priors are assumed:

\gamma_i \sim \mathcal{U}nif(0, \ \pi/2)

\boldsymbol{\theta}_j \sim \mathcal{N}_{2}(\mathbf{0}, \mathbf{I}_{2})

For identification, some \boldsymbol{\theta}_js are truncated above or below 0, or fixed to constants.

Value

An mcmc object that contains the posterior sample. This object can be summarized by functions provided by the coda package.

Author(s)

Qiushi Yu <yuqiushi@umich.edu> and Kevin M. Quinn <kmq@umich.edu>

References

Albert, J. H. and S. Chib. 1993. “Bayesian Analysis of Binary and Polychotomous Response Data.” J. Amer. Statist. Assoc. 88, 669-679

Yu, Qiushi and Kevin M. Quinn. 2021. “A Multidimensional Pairwise Comparison Model for Heterogeneous Perceptions with an Application to Modeling the Perceived Truthfulness of Public Statements on COVID-19.” University of Michigan Working Paper.

Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park. 2011. “MCMCpack: Markov Chain Monte Carlo in R.”, Journal of Statistical Software. 42(9): 1-21. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v042.i09")}.

Daniel Pemstein, Kevin M. Quinn, and Andrew D. Martin. 2007. Scythe Statistical Library 1.0. http://scythe.wustl.edu.s3-website-us-east-1.amazonaws.com/.

Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2006. “Output Analysis and Diagnostics for MCMC (CODA)”, R News. 6(1): 7-11. https://CRAN.R-project.org/doc/Rnews/Rnews_2006-1.pdf.

See Also

plot.mcmc,summary.mcmc, MCMCpaircompare, MCMCpaircompare2dDP

Examples


  ## Not run: 
## a synthetic data example
set.seed(123)

I <- 65  ## number of raters
J <- 50 ## number of items to be compared


## raters 1 to 5 put most weight on dimension 1
## raters 6 to 10 put most weight on dimension 2
## raters 11 to I put substantial weight on both dimensions
gamma.true <- c(runif(5, 0, 0.1),
             runif(5, 1.47, 1.57),
             runif(I-10, 0.58, 0.98) )
theta1.true <- rnorm(J, m=0, s=1)
theta2.true <- rnorm(J, m=0, s=1)
theta1.true[1] <- 2
theta2.true[1] <- 2
theta1.true[2] <- -2
theta2.true[2] <- -2
theta1.true[3] <-  2
theta2.true[3] <- -2



n.comparisons <- 125 ## number of pairwise comparisons for each rater

## generate synthetic data according to the assumed model
rater.id <- NULL
item.1.id <- NULL
item.2.id <- NULL
choice.id <- NULL
for (i in 1:I){
    for (c in 1:n.comparisons){
        rater.id <- c(rater.id, i+100)
        item.numbers <- sample(1:J, size=2, replace=FALSE)
        item.1 <- item.numbers[1]
        item.2 <- item.numbers[2]
        item.1.id <- c(item.1.id, item.1)
        item.2.id <- c(item.2.id, item.2)
        z <- c(cos(gamma.true[i]), sin(gamma.true[i]))
        eta <- z[1] * (theta1.true[item.1] - theta1.true[item.2])  +
            z[2] * (theta2.true[item.1] - theta2.true[item.2])
        prob.item.1.chosen <- pnorm(eta)
        u <- runif(1)
        if (u <= prob.item.1.chosen){
            choice.id <- c(choice.id, item.1)
        }
        else{
            choice.id <- c(choice.id, item.2)
        }
    }
}
item.1.id <- paste("item", item.1.id+100, sep=".")
item.2.id <- paste("item", item.2.id+100, sep=".")
choice.id <- paste("item", choice.id+100, sep=".")

sim.data <- data.frame(rater.id, item.1.id, item.2.id, choice.id)


## fit the model
posterior <- MCMCpaircompare2d(pwc.data=sim.data,
                             theta.constraints=list(item.101=list(1,2),
                                                    item.101=list(2,2),
                                                    item.102=list(1,-2),
                                                    item.102=list(2,-2),
                                                    item.103=list(1,"+"),
                                                    item.103=list(2,"-")),
                             verbose=1000,
                             burnin=500, mcmc=20000, thin=10,
                             store.theta=TRUE, store.gamma=TRUE, tune=0.5)





theta1.draws <- posterior[, grep("theta1", colnames(posterior))]
theta2.draws <- posterior[, grep("theta2", colnames(posterior))]
gamma.draws <- posterior[, grep("gamma", colnames(posterior))]

theta1.post.med <- apply(theta1.draws, 2, median)
theta2.post.med <- apply(theta2.draws, 2, median)
gamma.post.med <- apply(gamma.draws, 2, median)

theta1.post.025 <- apply(theta1.draws, 2, quantile, prob=0.025)
theta1.post.975 <- apply(theta1.draws, 2, quantile, prob=0.975)
theta2.post.025 <- apply(theta2.draws, 2, quantile, prob=0.025)
theta2.post.975 <- apply(theta2.draws, 2, quantile, prob=0.975)
gamma.post.025 <- apply(gamma.draws, 2, quantile, prob=0.025)
gamma.post.975 <- apply(gamma.draws, 2, quantile, prob=0.975)



## compare estimates to truth
par(mfrow=c(2,2))
plot(theta1.true, theta1.post.med, xlim=c(-2.5, 2.5), ylim=c(-2.5, 2.5),
     col=rgb(0,0,0,0.3))
segments(x0=theta1.true, x1=theta1.true,
         y0=theta1.post.025, y1=theta1.post.975,
         col=rgb(0,0,0,0.3)) 
abline(0, 1, col=rgb(1,0,0,0.5))

plot(theta2.true, theta2.post.med, xlim=c(-2.5, 2.5), ylim=c(-2.5, 2.5),
     col=rgb(0,0,0,0.3))
segments(x0=theta2.true, x1=theta2.true,
         y0=theta2.post.025, y1=theta2.post.975,
         col=rgb(0,0,0,0.3)) 
abline(0, 1, col=rgb(1,0,0,0.5))

plot(gamma.true, gamma.post.med, xlim=c(0, 1.6), ylim=c(0, 1.6),
     col=rgb(0,0,0,0.3))
segments(x0=gamma.true, x1=gamma.true,
         y0=gamma.post.025, y1=gamma.post.975,
         col=rgb(0,0,0,0.3)) 
abline(0, 1, col=rgb(1,0,0,0.5))


## plot point estimates 
plot(theta1.post.med, theta2.post.med,
     xlim=c(-2.5, 2.5), ylim=c(-2.5, 2.5),
     col=rgb(0,0,0,0.3))
for (i in 1:length(gamma.post.med)){
    arrows(x0=0, y0=0,
           x1=cos(gamma.post.med[i]),
           y1=sin(gamma.post.med[i]),
           col=rgb(1,0,0,0.2), len=0.05, lwd=0.5)
}

## End(Not run) 

MCMCpack documentation built on Sept. 11, 2024, 8:13 p.m.