View source: R/compute_mallows.R
compute_mallows  R Documentation 
Compute the posterior distributions of the parameters of the Bayesian Mallows Rank Model, given rankings or preferences stated by a set of assessors.
The BayesMallows
package uses the following parametrization of the
Mallows rank model \insertCitemallows1957BayesMallows:
p(r\alpha,\rho) = (1/Z_{n}(\alpha)) \exp{\alpha/n d(r,\rho)}
where
r
is a ranking, \alpha
is a scale parameter, \rho
is the
latent consensus ranking, Z_{n}(\alpha)
is the partition function
(normalizing constant), and d(r,\rho)
is a distance function
measuring the distance between r
and \rho
. Note that some
authors use a Mallows model without division by n
in the exponent;
this includes the PerMallows
package, whose scale parameter
\theta
corresponds to \alpha/n
in the BayesMallows
package. We refer to \insertCitevitelli2018BayesMallows for further
details of the Bayesian Mallows model.
compute_mallows
always returns posterior distributions of the latent
consensus ranking \rho
and the scale parameter \alpha
. Several
distance measures are supported, and the preferences can take the form of
complete or incomplete rankings, as well as pairwise preferences.
compute_mallows
can also compute mixtures of Mallows models, for
clustering of assessors with similar preferences.
compute_mallows(
rankings = NULL,
preferences = NULL,
obs_freq = NULL,
metric = "footrule",
error_model = NULL,
n_clusters = 1L,
clus_thin = 1L,
nmc = 2000L,
leap_size = max(1L, floor(n_items/5)),
swap_leap = 1L,
rho_init = NULL,
rho_thinning = 1L,
alpha_prop_sd = 0.1,
alpha_init = 1,
alpha_jump = 1L,
lambda = 0.001,
alpha_max = 1e+06,
psi = 10L,
include_wcd = (n_clusters > 1),
save_aug = FALSE,
aug_thinning = 1L,
logz_estimate = NULL,
verbose = FALSE,
validate_rankings = TRUE,
na_action = "augment",
constraints = NULL,
save_ind_clus = FALSE,
seed = NULL,
cl = NULL
)
rankings 
A matrix of ranked items, of size 
preferences 
A dataframe with pairwise comparisons, with 3 columns,
named 
obs_freq 
A vector of observation frequencies (weights) to apply do
each row in 
metric 
A character string specifying the distance metric to use in the
Bayesian Mallows Model. Available options are 
error_model 
Character string specifying which model to use for
inconsistent rankings. Defaults to 
n_clusters 
Integer specifying the number of clusters, i.e., the number
of mixture components to use. Defaults to 
clus_thin 
Integer specifying the thinning to be applied to cluster
assignments and cluster probabilities. Defaults to 
nmc 
Integer specifying the number of iteration of the
MetropolisHastings algorithm to run. Defaults to 
leap_size 
Integer specifying the step size of the leapandshift
proposal distribution. Defaults 
swap_leap 
Integer specifying the step size of the Swap proposal. Only
used when 
rho_init 
Numeric vector specifying the initial value of the latent
consensus ranking 
rho_thinning 
Integer specifying the thinning of 
alpha_prop_sd 
Numeric value specifying the standard deviation of the
lognormal proposal distribution used for 
alpha_init 
Numeric value specifying the initial value of the scale
parameter 
alpha_jump 
Integer specifying how many times to sample 
lambda 
Strictly positive numeric value specifying the rate parameter
of the truncated exponential prior distribution of 
alpha_max 
Maximum value of 
psi 
Integer specifying the concentration parameter 
include_wcd 
Logical indicating whether to store the withincluster
distances computed during the MetropolisHastings algorithm. Defaults to

save_aug 
Logical specifying whether or not to save the augmented
rankings every 
aug_thinning 
Integer specifying the thinning for saving augmented
data. Only used when 
logz_estimate 
Estimate of the partition function, computed with

verbose 
Logical specifying whether to print out the progress of the
MetropolisHastings algorithm. If 
validate_rankings 
Logical specifying whether the rankings provided (or
generated from 
na_action 
Character specifying how to deal with 
constraints 
Optional constraint set returned from

save_ind_clus 
Whether or not to save the individual cluster
probabilities in each step. This results in csv files

seed 
Optional integer to be used as random number seed. 
cl 
Optional cluster. 
A list of class BayesMallows.
compute_mallows_mixtures
for a function that computes
separate Mallows models for varying numbers of clusters.
Other modeling:
compute_mallows_mixtures()
,
smc_mallows_new_item_rank()
,
smc_mallows_new_users()
# ANALYSIS OF COMPLETE RANKINGS
# The example datasets potato_visual and potato_weighing contain complete
# rankings of 20 items, by 12 assessors. We first analyse these using the Mallows
# model:
model_fit < compute_mallows(potato_visual)
# We study the trace plot of the parameters
assess_convergence(model_fit, parameter = "alpha")
## Not run: assess_convergence(model_fit, parameter = "rho")
# Based on these plots, we set burnin = 1000.
model_fit$burnin < 1000
# Next, we use the generic plot function to study the posterior distributions
# of alpha and rho
plot(model_fit, parameter = "alpha")
## Not run: plot(model_fit, parameter = "rho", items = 10:15)
# We can also compute the CP consensus posterior ranking
compute_consensus(model_fit, type = "CP")
# And we can compute the posterior intervals:
# First we compute the interval for alpha
compute_posterior_intervals(model_fit, parameter = "alpha")
# Then we compute the interval for all the items
## Not run: compute_posterior_intervals(model_fit, parameter = "rho")
# ANALYSIS OF PAIRWISE PREFERENCES
## Not run:
# The example dataset beach_preferences contains pairwise
# preferences between beaches stated by 60 assessors. There
# is a total of 15 beaches in the dataset.
# In order to use it, we first generate all the orderings
# implied by the pairwise preferences.
beach_tc < generate_transitive_closure(beach_preferences)
# We also generate an inital rankings
beach_rankings < generate_initial_ranking(beach_tc, n_items = 15)
# We then run the Bayesian Mallows rank model
# We save the augmented data for diagnostics purposes.
model_fit < compute_mallows(rankings = beach_rankings,
preferences = beach_tc,
save_aug = TRUE,
verbose = TRUE)
# We can assess the convergence of the scale parameter
assess_convergence(model_fit)
# We can assess the convergence of latent rankings. Here we
# show beaches 15.
assess_convergence(model_fit, parameter = "rho", items = 1:5)
# We can also look at the convergence of the augmented rankings for
# each assessor.
assess_convergence(model_fit, parameter = "Rtilde",
items = c(2, 4), assessors = c(1, 2))
# Notice how, for assessor 1, the lines cross each other, while
# beach 2 consistently has a higher rank value (lower preference) for
# assessor 2. We can see why by looking at the implied orderings in
# beach_tc
subset(beach_tc, assessor %in% c(1, 2) &
bottom_item %in% c(2, 4) & top_item %in% c(2, 4))
# Assessor 1 has no implied ordering between beach 2 and beach 4,
# while assessor 2 has the implied ordering that beach 4 is preferred
# to beach 2. This is reflected in the trace plots.
## End(Not run)
# CLUSTERING OF ASSESSORS WITH SIMILAR PREFERENCES
## Not run:
# The example dataset sushi_rankings contains 5000 complete
# rankings of 10 types of sushi
# We start with computing a 3cluster solution
model_fit < compute_mallows(sushi_rankings, n_clusters = 3,
nmc = 10000, verbose = TRUE)
# We then assess convergence of the scale parameter alpha
assess_convergence(model_fit)
# Next, we assess convergence of the cluster probabilities
assess_convergence(model_fit, parameter = "cluster_probs")
# Based on this, we set burnin = 1000
# We now plot the posterior density of the scale parameters alpha in
# each mixture:
model_fit$burnin < 1000
plot(model_fit, parameter = "alpha")
# We can also compute the posterior density of the cluster probabilities
plot(model_fit, parameter = "cluster_probs")
# We can also plot the posterior cluster assignment. In this case,
# the assessors are sorted according to their maximum a posteriori cluster estimate.
plot(model_fit, parameter = "cluster_assignment")
# We can also assign each assessor to a cluster
cluster_assignments < assign_cluster(model_fit, soft = FALSE)
## End(Not run)
# DETERMINING THE NUMBER OF CLUSTERS
## Not run:
# Continuing with the sushi data, we can determine the number of cluster
# Let us look at any number of clusters from 1 to 10
# We use the convenience function compute_mallows_mixtures
n_clusters < seq(from = 1, to = 10)
models < compute_mallows_mixtures(n_clusters = n_clusters, rankings = sushi_rankings,
nmc = 6000, alpha_jump = 10, include_wcd = TRUE)
# models is a list in which each element is an object of class BayesMallows,
# returned from compute_mallows
# We can create an elbow plot
plot_elbow(models, burnin = 1000)
# We then select the number of cluster at a point where this plot has
# an "elbow", e.g., at 6 clusters.
## End(Not run)
# SPEEDING UP COMPUTION WITH OBSERVATION FREQUENCIES
# With a large number of assessors taking on a relatively low number of unique rankings,
# the obs_freq argument allows providing a rankings matrix with the unique set of rankings,
# and the obs_freq vector giving the number of assessors with each ranking.
# This is illustrated here for the potato_visual dataset
#
# assume each row of potato_visual corresponds to between 1 and 5 assessors, as
# given by the obs_freq vector
set.seed(1234)
obs_freq < sample.int(n = 5, size = nrow(potato_visual), replace = TRUE)
m < compute_mallows(rankings = potato_visual, obs_freq = obs_freq)
# See the separate help page for more examples, with the following code
help("obs_freq")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.