knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Aggregate scores refer to rating best-worst scaling scores across the entire sample. Sometimes, that is what researchers want to know: What are the most-preferred options? The bwsTools
package provides two functions for doing so: ae_mnl
and elo
. First, we load needed packages.
library(bwsTools) library(dplyr) library(tidyr) library(ggplot2)
First, I aggregate up my individual-level data, vdata
, to the format needed for ae_mnl
. See more on the tidying vignette for this.
dat <- vdata %>% group_by(issue) %>% summarise( totl = n(), best = sum(value == 1), wrst = sum(value == -1) )
The data look like:
dat
Such that abortion was picked as the most important issue facing our country ("best") 302 times and the least important issue ("worst") 352 times. This goes into the ae_mnl
function, which stands for analytical estimation of the multinomial logistic model. See the documentation for references on how this is calculated.
res1 <- ae_mnl(dat, "totl", "best", "wrst")
This result shows the regression coefficient for the multinomial model, the standard error, lower and upper bounds, and the choice probability for each item.
res1
I like to bind these columns back to my original data so that I can get the item labels. Then, I usually plot them in ggplot
the following way:
dat %>% bind_cols(res1) %>% arrange(b) %>% mutate(issue = factor(issue, issue)) %>% ggplot(aes(x = issue, y = b)) + geom_point() + geom_errorbar(aes(ymin = lb, ymax = ub), width = 0) + coord_flip()
From here, it is pretty easy to see that healthcare and economy are what people said they cared about most. There's a bunch in the middle, and then drugs, foreign affairs, and bias in the media all fall in the least important range.
The ae_mnl
function assumes a balanced incomplete block design. But sometimes that is not possible. What if you wanted to compare, say, 150 different items? A balanced incomplete block design would be too much for any one participant to realistically be able to do. One way is to calculate Elo scores (again, see the documentation for references).
This time, the function looks like an individual-level scoring function, even though it does aggregate ratings. You give the elo
function the participant identification, the block number, the item, and the value (1 for best, 0 for not chosen, -1 for worst).
set.seed(1839) res2 <- elo(vdata, "id", "block", "issue", "value")
The result looks similar to the above. All Elo scores start at 1000, so we are looking for how above or below an item is relative to 1000.
res2 <- structure(list(item = c("abortion", "biasmedia", "corruption", "crime", "drugs", "economy", "education", "foreignaffairs", "guns", "healthcare", "natsecurity", "race", "taxes"), elo = c(983.604707872469, 775.029578108074, 987.095165229659, 1005.70809857833, 908.116241829412, 1166.78863775417, 1065.4369480057, 858.26118119395, 1022.80855081941, 1194.37125023744, 1043.98840666976, 974.595472655393, 1013.59357525509 )), row.names = c(NA, -13L), class = c("tbl_df", "tbl", "data.frame" ))
res2
Since these data both came from a balanced incomplete block design, we would expect the two methods to agree. And they do:
cor(res1$b, res2$elo) plot(res1$b, res2$elo)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.