data_description: Descriptive summaries for partial rankings
In MSmix: Finite Mixtures of Mallows Models with Spearman Distance for Full and Partial Rankings

View source: R/MSmix_functions_package.R

data_description

R Documentation

Descriptive summaries for partial rankings

Description

Compute various data summaries for a partial ranking dataset. Differently from existing analogous functions supplied by other R packages, data_description supports partial observations with arbitrary patterns of censoring.

print method for class "data_descr".

Usage

data_description(
  rankings,
  marg = TRUE,
  borda_ord = FALSE,
  paired_comp = TRUE,
  subset = NULL,
  item_names = NULL
)

## S3 method for class 'data_descr'
print(x, ...)

Arguments

`rankings`	Integer `N\timesn` matrix or data frame with partial rankings in each row. Missing positions must be coded as `NA`.
`marg`	Logical: whether the first-order marginals have to be computed. Defaults to `TRUE`.
`borda_ord`	Logical: whether, in the summary statistics, the items must be ordered according to the Borda ranking (i.e., mean rank vector). Defaults to `FALSE`.
`paired_comp`	Logical: whether the pairwise comparison matrix has to be computed. Defaults to `TRUE`.
`subset`	Optional logical or integer vector specifying the subset of observations, i.e. rows of `rankings`, to be kept. Missing values are taken as `FALSE`. Defaults to `NULL` meaning that all the rows are considered.
`item_names`	Character vector with the names to be used for the items. Defaults to `NULL`, meaning that `colnames(rankings)` is used and, if not available, `item_names` is set equal to `"Item1","Item2",...`.
`x`	An object of class `"data_descr"` returned by `data_description`.
`...`	Further arguments passed to or from other methods (not used).

Details

The implementation of data_description is similar to that of rank_summaries from the PLMIX package. Differently from the latter, data_description works with any kind of partial rankings (not only top rankings) and allows to summarize subsamples thanks to the additional subset argument.

The Borda ranking, obtained from the ordering of the mean rank vector, corresponds to the MLE of the consensus ranking of the Mallows model with Spearman distance. If mean_rank contains some NAs, the corresponding items occupy the bottom positions in the borda_ordering according to the order they appear in item_names.

Value

An object of class "data_descr", which is a list with the following named components:

`n_ranked`	Integer vector of length `N` with the number of items ranked in each partial sequence.
`n_ranked_distr`	Frequency distribution of the `n_ranked` vector.
`n_ranks_by_item`	Integer `3\timesn` matrix with the number of times that each item has been ranked or not. The last row contains the total by column, i.e. the sample size `N`.
`mean_rank`	Mean rank vector.
`borda_ordering`	Character vector corresponding to the Borda ordering. This is obtained from the ranking of the mean rank vector.
`marginals`	Integer `n\timesn` matrix of the first-order marginals in each column: the `(j,i)`-th entry indicates the number of times that item `i` is ranked in position `j`.
`pc`	Integer `n\timesn` pairwise comparison matrix: the `(i,i')`-th entry indicates the number of times that item `i` is preferred to item `i'`.
`rankings`	When `borda_ord = TRUE`, an integer `N\timesn` matrix corresponding to `rankings` with columns rearranged according to the Borda ordering, otherwise the input `rankings`.

References

Mollica C and Tardella L (2020). PLMIX: An R package for modelling and clustering partially ranked data. Journal of Statistical Computation and Simulation, 90(5), pages 925–959, ISSN: 0094-9655, DOI: 10.1080/00949655.2020.1711909.

Marden JI (1995). Analyzing and modeling rank data. Monographs on Statistics and Applied Probability (64). Chapman & Hall, ISSN: 0-412-99521-2. London.

Examples


## Example 1. Sample statistics for the Antifragility dataset.
r_antifrag <- ranks_antifragility[, 1:7]
descr <- data_description(rankings = r_antifrag)
descr

## Example 2. Sample statistics for the Sports dataset.
r_sports <- ranks_sports[, 1:8]
descr <- data_description(rankings = r_sports, borda_ord = TRUE)
descr

## Example 3. Sample statistics for the Sports dataset by gender.
r_sports <- ranks_sports[, 1:8]
desc_f <- data_description(rankings = r_sports, subset = (ranks_sports$Gender == "Female"))
desc_m <- data_description(rankings = r_sports, subset = (ranks_sports$Gender == "Male"))
desc_f
desc_m

MSmix documentation built on April 3, 2025, 9:29 p.m.