generate_initial_ranking: Generate Initial Ranking

View source: R/generate_initial_ranking.R

generate_initial_rankingR Documentation

Generate Initial Ranking

Description

Given a consistent set of pairwise preferences, generate a complete ranking of items which is consistent with the preferences.

Usage

generate_initial_ranking(
  tc,
  n_items = max(tc[, c("bottom_item", "top_item")]),
  cl = NULL,
  shuffle_unranked = FALSE,
  random = FALSE,
  random_limit = 8L
)

Arguments

tc

A dataframe with pairwise comparisons of S3 subclass BayesMallowsTC, returned from generate_transitive_closure.

n_items

The total number of items. If not provided, it is assumed to equal the largest item index found in tc, i.e., max(tc[, c("bottom_item", "top_item")]).

cl

Optional computing cluster used for parallelization, returned from parallel::makeCluster. Defaults to NULL.

shuffle_unranked

Logical specifying whether or not to randomly permuted unranked items in the initial ranking. When shuffle_unranked=TRUE and random=FALSE, all unranked items for each assessor are randomly permuted. Otherwise, the first ordering returned by igraph::topo_sort() is returned.

random

Logical specifying whether or not to use a random initial ranking. Defaults to FALSE. Setting this to TRUE means that all possible orderings consistent with the stated pairwise preferences are generated for each assessor, and one of them is picked at random.

random_limit

Integer specifying the maximum number of items allowed when all possible orderings are computed, i.e., when random=TRUE. Defaults to 8L.

Value

A matrix of rankings which can be given in the rankings argument to compute_mallows.

Note

Setting random=TRUE means that all possible orderings of each assessor's preferences are generated, and one of them is picked at random. This can be useful when experiencing convergence issues, e.g., if the MCMC algorithm does not mixed properly. However, finding all possible orderings is a combinatorial problem, which may be computationally very hard. The result may not even be possible to fit in memory, which may cause the R session to crash. When using this option, please try to increase the size of the problem incrementally, by starting with smaller subsets of the complete data. An example is given below.

As detailed in the documentation to generate_transitive_closure, it is assumed that the items are labeled starting from 1. For example, if a single comparison of the following form is provided, it is assumed that there is a total of 30 items (n_items=30), and the initial ranking is a permutation of these 30 items consistent with the preference 29<30.

assessor bottom_item top_item
1 29 30

If in reality there are only two items, they should be relabeled to 1 and 2, as follows:

assessor bottom_item top_item
1 1 2

See Also

Other preprocessing: estimate_partition_function(), generate_constraints(), generate_transitive_closure(), obs_freq, prepare_partition_function()

Examples

# The example dataset beach_preferences contains pairwise preferences of beach.
# We must first generate the transitive closure
beach_tc <- generate_transitive_closure(beach_preferences)

# Next, we generate an initial ranking
beach_init <- generate_initial_ranking(beach_tc)

# Look at the first few rows:
head(beach_init)

# We can add more informative column names in order
# to get nicer posterior plots:
colnames(beach_init) <- paste("Beach", seq(from = 1, to = ncol(beach_init), by = 1))
head(beach_init)

# By default, the algorithm for generating the initial ranking is deterministic.
# It is possible to randomly permute the unranked items with the argument shuffle_unranked,
# as demonstrated below. This algorithm is computationally efficient, but defaults to FALSE
# for backward compatibility.
set.seed(2233)
beach_init <- generate_initial_ranking(beach_tc, shuffle_unranked = TRUE)
head(beach_init)

# It is also possible to pick a random sample among all topological sorts.
# This requires first enumerating all possible sorts, and might hence be computationally
# demanding. Here is an example, where we reduce the data considerable to speed up computation.
small_tc <- beach_tc[beach_tc$assessor %in% 1:6 &
                       beach_tc$bottom_item %in% 1:4 & beach_tc$top_item %in% 1:4, ]
set.seed(123)
init_small <- generate_initial_ranking(tc = small_tc, n_items = 4, random = TRUE)
# Look at the initial rankings generated
init_small

# For this small dataset, we can also study the effect of setting shuffle_unranked=TRUE
# in more detail. We consider assessors 1 and 2 only.
# First is the deterministic ordering. This one is equal for each run.
generate_initial_ranking(tc = small_tc[small_tc$assessor %in% c(1, 2), ],
                         n_items = 4, shuffle_unranked = FALSE, random = FALSE)
# Next we shuffle the unranked, setting the seed for reproducibility.
# For assessor 1, item 2 is unranked, and by rerunning the code multiple times,
# we see that element (1, 2) indeed changes ranking randomly.
# For assessor 2, item 3 is unranked, and we can also see that this item changes
# ranking randomly when rerunning the function multiple times.
# The ranked items also change their ranking from one random realiziation to another,
# but their relative ordering is constant.
set.seed(123)
generate_initial_ranking(tc = small_tc[small_tc$assessor %in% c(1, 2), ],
                         n_items = 4, shuffle_unranked = TRUE, random = FALSE)


## Not run: 
  # We now give beach_init and beach_tc to compute_mallows. We tell compute_mallows
  # to save the augmented data, in order to study the convergence.
  model_fit <- compute_mallows(rankings = beach_init,
                               preferences = beach_tc,
                               nmc = 2000,
                               save_aug = TRUE)

  # We can study the acceptance rate of the augmented rankings
  assess_convergence(model_fit, parameter = "Rtilde")

  # We can also study the posterior distribution of the consensus rank of each beach
  model_fit$burnin <- 500
  plot(model_fit, parameter = "rho", items = 1:15)

## End(Not run)

## Not run: 
  # The computations can also be done in parallel
  library(parallel)
  cl <- makeCluster(detectCores() - 1)
  beach_tc <- generate_transitive_closure(beach_preferences, cl = cl)
  beach_init <- generate_initial_ranking(beach_tc, cl = cl)
  stopCluster(cl)

## End(Not run)

BayesMallows documentation built on Nov. 25, 2023, 5:09 p.m.