View source: R/compute_mallows.R
compute_mallows | R Documentation |
Compute the posterior distributions of the parameters of the Bayesian Mallows Rank Model, given rankings or preferences stated by a set of assessors.
The BayesMallows
package uses the following parametrization of the
Mallows rank model \insertCitemallows1957BayesMallows:
p(r|\alpha,\rho) = (1/Z_{n}(\alpha)) \exp{-\alpha/n d(r,\rho)}
where
r
is a ranking, \alpha
is a scale parameter, \rho
is the
latent consensus ranking, Z_{n}(\alpha)
is the partition function
(normalizing constant), and d(r,\rho)
is a distance function
measuring the distance between r
and \rho
. Note that some
authors use a Mallows model without division by n
in the exponent;
this includes the PerMallows
package, whose scale parameter
\theta
corresponds to \alpha/n
in the BayesMallows
package. We refer to \insertCitevitelli2018BayesMallows for further
details of the Bayesian Mallows model.
compute_mallows
always returns posterior distributions of the latent
consensus ranking \rho
and the scale parameter \alpha
. Several
distance measures are supported, and the preferences can take the form of
complete or incomplete rankings, as well as pairwise preferences.
compute_mallows
can also compute mixtures of Mallows models, for
clustering of assessors with similar preferences.
compute_mallows(
rankings = NULL,
preferences = NULL,
obs_freq = NULL,
metric = "footrule",
error_model = NULL,
n_clusters = 1L,
clus_thin = 1L,
nmc = 2000L,
leap_size = max(1L, floor(n_items/5)),
swap_leap = 1L,
rho_init = NULL,
rho_thinning = 1L,
alpha_prop_sd = 0.1,
alpha_init = 1,
alpha_jump = 1L,
lambda = 0.001,
alpha_max = 1e+06,
psi = 10L,
include_wcd = (n_clusters > 1),
save_aug = FALSE,
aug_thinning = 1L,
logz_estimate = NULL,
verbose = FALSE,
validate_rankings = TRUE,
na_action = "augment",
constraints = NULL,
save_ind_clus = FALSE,
seed = NULL,
cl = NULL
)
rankings |
A matrix of ranked items, of size |
preferences |
A dataframe with pairwise comparisons, with 3 columns,
named |
obs_freq |
A vector of observation frequencies (weights) to apply do
each row in |
metric |
A character string specifying the distance metric to use in the
Bayesian Mallows Model. Available options are |
error_model |
Character string specifying which model to use for
inconsistent rankings. Defaults to |
n_clusters |
Integer specifying the number of clusters, i.e., the number
of mixture components to use. Defaults to |
clus_thin |
Integer specifying the thinning to be applied to cluster
assignments and cluster probabilities. Defaults to |
nmc |
Integer specifying the number of iteration of the
Metropolis-Hastings algorithm to run. Defaults to |
leap_size |
Integer specifying the step size of the leap-and-shift
proposal distribution. Defaults |
swap_leap |
Integer specifying the step size of the Swap proposal. Only
used when |
rho_init |
Numeric vector specifying the initial value of the latent
consensus ranking |
rho_thinning |
Integer specifying the thinning of |
alpha_prop_sd |
Numeric value specifying the standard deviation of the
lognormal proposal distribution used for |
alpha_init |
Numeric value specifying the initial value of the scale
parameter |
alpha_jump |
Integer specifying how many times to sample |
lambda |
Strictly positive numeric value specifying the rate parameter
of the truncated exponential prior distribution of |
alpha_max |
Maximum value of |
psi |
Integer specifying the concentration parameter |
include_wcd |
Logical indicating whether to store the within-cluster
distances computed during the Metropolis-Hastings algorithm. Defaults to
|
save_aug |
Logical specifying whether or not to save the augmented
rankings every |
aug_thinning |
Integer specifying the thinning for saving augmented
data. Only used when |
logz_estimate |
Estimate of the partition function, computed with
|
verbose |
Logical specifying whether to print out the progress of the
Metropolis-Hastings algorithm. If |
validate_rankings |
Logical specifying whether the rankings provided (or
generated from |
na_action |
Character specifying how to deal with |
constraints |
Optional constraint set returned from
|
save_ind_clus |
Whether or not to save the individual cluster
probabilities in each step. This results in csv files
|
seed |
Optional integer to be used as random number seed. |
cl |
Optional cluster. |
A list of class BayesMallows.
compute_mallows_mixtures
for a function that computes
separate Mallows models for varying numbers of clusters.
Other modeling:
compute_mallows_mixtures()
,
smc_mallows_new_item_rank()
,
smc_mallows_new_users()
# ANALYSIS OF COMPLETE RANKINGS
# The example datasets potato_visual and potato_weighing contain complete
# rankings of 20 items, by 12 assessors. We first analyse these using the Mallows
# model:
model_fit <- compute_mallows(potato_visual)
# We study the trace plot of the parameters
assess_convergence(model_fit, parameter = "alpha")
## Not run: assess_convergence(model_fit, parameter = "rho")
# Based on these plots, we set burnin = 1000.
model_fit$burnin <- 1000
# Next, we use the generic plot function to study the posterior distributions
# of alpha and rho
plot(model_fit, parameter = "alpha")
## Not run: plot(model_fit, parameter = "rho", items = 10:15)
# We can also compute the CP consensus posterior ranking
compute_consensus(model_fit, type = "CP")
# And we can compute the posterior intervals:
# First we compute the interval for alpha
compute_posterior_intervals(model_fit, parameter = "alpha")
# Then we compute the interval for all the items
## Not run: compute_posterior_intervals(model_fit, parameter = "rho")
# ANALYSIS OF PAIRWISE PREFERENCES
## Not run:
# The example dataset beach_preferences contains pairwise
# preferences between beaches stated by 60 assessors. There
# is a total of 15 beaches in the dataset.
# In order to use it, we first generate all the orderings
# implied by the pairwise preferences.
beach_tc <- generate_transitive_closure(beach_preferences)
# We also generate an inital rankings
beach_rankings <- generate_initial_ranking(beach_tc, n_items = 15)
# We then run the Bayesian Mallows rank model
# We save the augmented data for diagnostics purposes.
model_fit <- compute_mallows(rankings = beach_rankings,
preferences = beach_tc,
save_aug = TRUE,
verbose = TRUE)
# We can assess the convergence of the scale parameter
assess_convergence(model_fit)
# We can assess the convergence of latent rankings. Here we
# show beaches 1-5.
assess_convergence(model_fit, parameter = "rho", items = 1:5)
# We can also look at the convergence of the augmented rankings for
# each assessor.
assess_convergence(model_fit, parameter = "Rtilde",
items = c(2, 4), assessors = c(1, 2))
# Notice how, for assessor 1, the lines cross each other, while
# beach 2 consistently has a higher rank value (lower preference) for
# assessor 2. We can see why by looking at the implied orderings in
# beach_tc
subset(beach_tc, assessor %in% c(1, 2) &
bottom_item %in% c(2, 4) & top_item %in% c(2, 4))
# Assessor 1 has no implied ordering between beach 2 and beach 4,
# while assessor 2 has the implied ordering that beach 4 is preferred
# to beach 2. This is reflected in the trace plots.
## End(Not run)
# CLUSTERING OF ASSESSORS WITH SIMILAR PREFERENCES
## Not run:
# The example dataset sushi_rankings contains 5000 complete
# rankings of 10 types of sushi
# We start with computing a 3-cluster solution
model_fit <- compute_mallows(sushi_rankings, n_clusters = 3,
nmc = 10000, verbose = TRUE)
# We then assess convergence of the scale parameter alpha
assess_convergence(model_fit)
# Next, we assess convergence of the cluster probabilities
assess_convergence(model_fit, parameter = "cluster_probs")
# Based on this, we set burnin = 1000
# We now plot the posterior density of the scale parameters alpha in
# each mixture:
model_fit$burnin <- 1000
plot(model_fit, parameter = "alpha")
# We can also compute the posterior density of the cluster probabilities
plot(model_fit, parameter = "cluster_probs")
# We can also plot the posterior cluster assignment. In this case,
# the assessors are sorted according to their maximum a posteriori cluster estimate.
plot(model_fit, parameter = "cluster_assignment")
# We can also assign each assessor to a cluster
cluster_assignments <- assign_cluster(model_fit, soft = FALSE)
## End(Not run)
# DETERMINING THE NUMBER OF CLUSTERS
## Not run:
# Continuing with the sushi data, we can determine the number of cluster
# Let us look at any number of clusters from 1 to 10
# We use the convenience function compute_mallows_mixtures
n_clusters <- seq(from = 1, to = 10)
models <- compute_mallows_mixtures(n_clusters = n_clusters, rankings = sushi_rankings,
nmc = 6000, alpha_jump = 10, include_wcd = TRUE)
# models is a list in which each element is an object of class BayesMallows,
# returned from compute_mallows
# We can create an elbow plot
plot_elbow(models, burnin = 1000)
# We then select the number of cluster at a point where this plot has
# an "elbow", e.g., at 6 clusters.
## End(Not run)
# SPEEDING UP COMPUTION WITH OBSERVATION FREQUENCIES
# With a large number of assessors taking on a relatively low number of unique rankings,
# the obs_freq argument allows providing a rankings matrix with the unique set of rankings,
# and the obs_freq vector giving the number of assessors with each ranking.
# This is illustrated here for the potato_visual dataset
#
# assume each row of potato_visual corresponds to between 1 and 5 assessors, as
# given by the obs_freq vector
set.seed(1234)
obs_freq <- sample.int(n = 5, size = nrow(potato_visual), replace = TRUE)
m <- compute_mallows(rankings = potato_visual, obs_freq = obs_freq)
# See the separate help page for more examples, with the following code
help("obs_freq")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.