compute_mallows: Preference Learning with the Mallows Rank Model

View source: R/compute_mallows.R

compute_mallowsR Documentation

Preference Learning with the Mallows Rank Model


Compute the posterior distributions of the parameters of the Bayesian Mallows Rank Model, given rankings or preferences stated by a set of assessors.

The BayesMallows package uses the following parametrization of the Mallows rank model \insertCitemallows1957BayesMallows:

p(r|\alpha,\rho) = (1/Z_{n}(\alpha)) \exp{-\alpha/n d(r,\rho)}

where r is a ranking, \alpha is a scale parameter, \rho is the latent consensus ranking, Z_{n}(\alpha) is the partition function (normalizing constant), and d(r,\rho) is a distance function measuring the distance between r and \rho. Note that some authors use a Mallows model without division by n in the exponent; this includes the PerMallows package, whose scale parameter \theta corresponds to \alpha/n in the BayesMallows package. We refer to \insertCitevitelli2018BayesMallows for further details of the Bayesian Mallows model.

compute_mallows always returns posterior distributions of the latent consensus ranking \rho and the scale parameter \alpha. Several distance measures are supported, and the preferences can take the form of complete or incomplete rankings, as well as pairwise preferences. compute_mallows can also compute mixtures of Mallows models, for clustering of assessors with similar preferences.


  rankings = NULL,
  preferences = NULL,
  obs_freq = NULL,
  metric = "footrule",
  error_model = NULL,
  n_clusters = 1L,
  clus_thin = 1L,
  nmc = 2000L,
  leap_size = max(1L, floor(n_items/5)),
  swap_leap = 1L,
  rho_init = NULL,
  rho_thinning = 1L,
  alpha_prop_sd = 0.1,
  alpha_init = 1,
  alpha_jump = 1L,
  lambda = 0.001,
  alpha_max = 1e+06,
  psi = 10L,
  include_wcd = (n_clusters > 1),
  save_aug = FALSE,
  aug_thinning = 1L,
  logz_estimate = NULL,
  verbose = FALSE,
  validate_rankings = TRUE,
  na_action = "augment",
  constraints = NULL,
  save_ind_clus = FALSE,
  seed = NULL,
  cl = NULL



A matrix of ranked items, of size n_assessors x n_items. See create_ranking if you have an ordered set of items that need to be converted to rankings. If preferences is provided, rankings is an optional initial value of the rankings, generated by generate_initial_ranking. If rankings has column names, these are assumed to be the names of the items. NA values in rankings are treated as missing data and automatically augmented; to change this behavior, see the na_action argument.


A dataframe with pairwise comparisons, with 3 columns, named assessor, bottom_item, and top_item, and one row for each stated preference. Given a set of pairwise preferences, generate a transitive closure using generate_transitive_closure. This will give preferences the class "BayesMallowsTC". If preferences is not of class "BayesMallowsTC", compute_mallows will call generate_transitive_closure on preferences before computations are done. In the current version, the pairwise preferences are assumed to be mutually compatible.


A vector of observation frequencies (weights) to apply do each row in rankings. This can speed up computation if a large number of assessors share the same rank pattern. Defaults to NULL, which means that each row of rankings is multiplied by 1. If provided, obs_freq must have the same number of elements as there are rows in rankings, and rankings cannot be NULL. See obs_freq for more information and rank_freq_distr for a convenience function for computing it.


A character string specifying the distance metric to use in the Bayesian Mallows Model. Available options are "footrule", "spearman", "cayley", "hamming", "kendall", and "ulam". The distance given by metric is also used to compute within-cluster distances, when include_wcd = TRUE.


Character string specifying which model to use for inconsistent rankings. Defaults to NULL, which means that inconsistent rankings are not allowed. At the moment, the only available other option is "bernoulli", which means that the Bernoulli error model is used. See \insertCitecrispino2019;textualBayesMallows for a definition of the Bernoulli model.


Integer specifying the number of clusters, i.e., the number of mixture components to use. Defaults to 1L, which means no clustering is performed. See compute_mallows_mixtures for a convenience function for computing several models with varying numbers of mixtures.


Integer specifying the thinning to be applied to cluster assignments and cluster probabilities. Defaults to 1L.


Integer specifying the number of iteration of the Metropolis-Hastings algorithm to run. Defaults to 2000L. See assess_convergence for tools to check convergence of the Markov chain.


Integer specifying the step size of the leap-and-shift proposal distribution. Defaults floor(n_items / 5).


Integer specifying the step size of the Swap proposal. Only used when error_model is not NULL.


Numeric vector specifying the initial value of the latent consensus ranking \rho. Defaults to NULL, which means that the initial value is set randomly. If rho_init is provided when n_clusters > 1, each mixture component \rho_{c} gets the same initial value.


Integer specifying the thinning of rho to be performed in the Metropolis- Hastings algorithm. Defaults to 1L. compute_mallows save every rho_thinningth value of \rho.


Numeric value specifying the standard deviation of the lognormal proposal distribution used for \alpha in the Metropolis-Hastings algorithm. Defaults to 0.1.


Numeric value specifying the initial value of the scale parameter \alpha. Defaults to 1. When n_clusters > 1, each mixture component \alpha_{c} gets the same initial value. When chains are run in parallel, by providing an argument cl = cl, then alpha_init can be a vector of of length length(cl), each element of which becomes an initial value for the given chain.


Integer specifying how many times to sample \rho between each sampling of \alpha. In other words, how many times to jump over \alpha while sampling \rho, and possibly other parameters like augmented ranks \tilde{R} or cluster assignments z. Setting alpha_jump to a high number can speed up computation time, by reducing the number of times the partition function for the Mallows model needs to be computed. Defaults to 1L.


Strictly positive numeric value specifying the rate parameter of the truncated exponential prior distribution of \alpha. Defaults to 0.1. When n_cluster > 1, each mixture component \alpha_{c} has the same prior distribution.


Maximum value of alpha in the truncated exponential prior distribution.


Integer specifying the concentration parameter \psi of the Dirichlet prior distribution used for the cluster probabilities \tau_{1}, \tau_{2}, \dots, \tau_{C}, where C is the value of n_clusters. Defaults to 10L. When n_clusters = 1, this argument is not used.


Logical indicating whether to store the within-cluster distances computed during the Metropolis-Hastings algorithm. Defaults to TRUE if n_clusters > 1 and otherwise FALSE. Setting include_wcd = TRUE is useful when deciding the number of mixture components to include, and is required by plot_elbow.


Logical specifying whether or not to save the augmented rankings every aug_thinningth iteration, for the case of missing data or pairwise preferences. Defaults to FALSE. Saving augmented data is useful for predicting the rankings each assessor would give to the items not yet ranked, and is required by plot_top_k.


Integer specifying the thinning for saving augmented data. Only used when save_aug = TRUE. Defaults to 1L.


Estimate of the partition function, computed with estimate_partition_function. Be aware that when using an estimated partition function when n_clusters > 1, the partition function should be estimated over the whole range of \alpha values covered by the prior distribution for \alpha with high probability. In the case that a cluster \alpha_c becomes empty during the Metropolis-Hastings algorithm, the posterior of \alpha_c equals its prior. For example, if the rate parameter of the exponential prior equals, say \lambda = 0.001, there is about 37 % (or exactly: 1 - pexp(1000, 0.001)) prior probability that \alpha_c > 1000. Hence when n_clusters > 1, the estimated partition function should cover this range, or \lambda should be increased.


Logical specifying whether to print out the progress of the Metropolis-Hastings algorithm. If TRUE, a notification is printed every 1000th iteration. Defaults to FALSE.


Logical specifying whether the rankings provided (or generated from preferences) should be validated. Defaults to TRUE. Turning off this check will reduce computing time with a large number of items or assessors.


Character specifying how to deal with NA values in the rankings matrix, if provided. Defaults to "augment", which means that missing values are automatically filled in using the Bayesian data augmentation scheme described in \insertCitevitelli2018;textualBayesMallows. The other options for this argument are "fail", which means that an error message is printed and the algorithm stops if there are NAs in rankings, and "omit" which simply deletes rows with NAs in them.


Optional constraint set returned from generate_constraints. Defaults to NULL, which means the the constraint set is computed internally. In repeated calls to compute_mallows, with very large datasets, computing the constraint set may be time consuming. In this case it can be beneficial to precompute it and provide it as a separate argument.


Whether or not to save the individual cluster probabilities in each step. This results in csv files cluster_probs1.csv, cluster_probs2.csv, ..., being saved in the calling directory. This option may slow down the code considerably, but is necessary for detecting label switching using Stephen's algorithm. See label_switching for more information.


Optional integer to be used as random number seed.


Optional cluster.


A list of class BayesMallows.



See Also

compute_mallows_mixtures for a function that computes separate Mallows models for varying numbers of clusters.

Other modeling: compute_mallows_mixtures(), smc_mallows_new_item_rank(), smc_mallows_new_users()


# The example datasets potato_visual and potato_weighing contain complete
# rankings of 20 items, by 12 assessors. We first analyse these using the Mallows
# model:
model_fit <- compute_mallows(potato_visual)

# We study the trace plot of the parameters
assess_convergence(model_fit, parameter = "alpha")
## Not run: assess_convergence(model_fit, parameter = "rho")

# Based on these plots, we set burnin = 1000.
model_fit$burnin <- 1000
# Next, we use the generic plot function to study the posterior distributions
# of alpha and rho
plot(model_fit, parameter = "alpha")
## Not run: plot(model_fit, parameter = "rho", items = 10:15)

# We can also compute the CP consensus posterior ranking
compute_consensus(model_fit, type = "CP")

# And we can compute the posterior intervals:
# First we compute the interval for alpha
compute_posterior_intervals(model_fit, parameter = "alpha")
# Then we compute the interval for all the items
## Not run: compute_posterior_intervals(model_fit, parameter = "rho")

## Not run: 
  # The example dataset beach_preferences contains pairwise
  # preferences between beaches stated by 60 assessors. There
  # is a total of 15 beaches in the dataset.
  # In order to use it, we first generate all the orderings
  # implied by the pairwise preferences.
  beach_tc <- generate_transitive_closure(beach_preferences)
  # We also generate an inital rankings
  beach_rankings <- generate_initial_ranking(beach_tc, n_items = 15)
  # We then run the Bayesian Mallows rank model
  # We save the augmented data for diagnostics purposes.
  model_fit <- compute_mallows(rankings = beach_rankings,
                               preferences = beach_tc,
                               save_aug = TRUE,
                               verbose = TRUE)
  # We can assess the convergence of the scale parameter
  # We can assess the convergence of latent rankings. Here we
  # show beaches 1-5.
  assess_convergence(model_fit, parameter = "rho", items = 1:5)
  # We can also look at the convergence of the augmented rankings for
  # each assessor.
  assess_convergence(model_fit, parameter = "Rtilde",
                     items = c(2, 4), assessors = c(1, 2))
  # Notice how, for assessor 1, the lines cross each other, while
  # beach 2 consistently has a higher rank value (lower preference) for
  # assessor 2. We can see why by looking at the implied orderings in
  # beach_tc
  subset(beach_tc, assessor %in% c(1, 2) &
           bottom_item %in% c(2, 4) & top_item %in% c(2, 4))
  # Assessor 1 has no implied ordering between beach 2 and beach 4,
  # while assessor 2 has the implied ordering that beach 4 is preferred
  # to beach 2. This is reflected in the trace plots.

## End(Not run)

## Not run: 
  # The example dataset sushi_rankings contains 5000 complete
  # rankings of 10 types of sushi
  # We start with computing a 3-cluster solution
  model_fit <- compute_mallows(sushi_rankings, n_clusters = 3,
                               nmc = 10000, verbose = TRUE)
  # We then assess convergence of the scale parameter alpha
  # Next, we assess convergence of the cluster probabilities
  assess_convergence(model_fit, parameter = "cluster_probs")
  # Based on this, we set burnin = 1000
  # We now plot the posterior density of the scale parameters alpha in
  # each mixture:
  model_fit$burnin <- 1000
  plot(model_fit, parameter = "alpha")
  # We can also compute the posterior density of the cluster probabilities
  plot(model_fit, parameter = "cluster_probs")
  # We can also plot the posterior cluster assignment. In this case,
  # the assessors are sorted according to their maximum a posteriori cluster estimate.
  plot(model_fit, parameter = "cluster_assignment")
  # We can also assign each assessor to a cluster
  cluster_assignments <- assign_cluster(model_fit, soft = FALSE)
## End(Not run)

## Not run: 
  # Continuing with the sushi data, we can determine the number of cluster
  # Let us look at any number of clusters from 1 to 10
  # We use the convenience function compute_mallows_mixtures
  n_clusters <- seq(from = 1, to = 10)
  models <- compute_mallows_mixtures(n_clusters = n_clusters, rankings = sushi_rankings,
                                     nmc = 6000, alpha_jump = 10, include_wcd = TRUE)
  # models is a list in which each element is an object of class BayesMallows,
  # returned from compute_mallows
  # We can create an elbow plot
  plot_elbow(models, burnin = 1000)
  # We then select the number of cluster at a point where this plot has
  # an "elbow", e.g., at 6 clusters.

## End(Not run)

# With a large number of assessors taking on a relatively low number of unique rankings,
# the obs_freq argument allows providing a rankings matrix with the unique set of rankings,
# and the obs_freq vector giving the number of assessors with each ranking.
# This is illustrated here for the potato_visual dataset
# assume each row of potato_visual corresponds to between 1 and 5 assessors, as
# given by the obs_freq vector
obs_freq <- = 5, size = nrow(potato_visual), replace = TRUE)
m <- compute_mallows(rankings = potato_visual, obs_freq = obs_freq)

# See the separate help page for more examples, with the following code

BayesMallows documentation built on Nov. 25, 2023, 5:09 p.m.