data_censoring: Censoring of full rankings

View source: R/MSmix_functions_package.R

data_censoringR Documentation

Censoring of full rankings

Description

Convert full rankings into either top-k rankings or into partial rankings with missing data in arbitrary positions.

Usage

data_censoring(
  rankings,
  topk = TRUE,
  nranked = NULL,
  probs = rep(1, ncol(rankings) - 1)
)

Arguments

rankings

Integer N\timesn matrix or data frame with full rankings in each row.

topk

Logical: whether the full rankings must be converted into top-k rankings (TRUE) or into partial rankings with missing data in arbitrary positions (FALSE). Defaults to TRUE.

nranked

Integer vector of length N with the desired number of positions to be retained in each partial sequence after censoring. If nranked = NULL (default), the number of positions are randomly generated according to the probabilities in the probs argument.

probs

Numeric vector of the (n-1) probabilities for the random generation of the number of positions to be retained in each partial sequence after censoring (normalization is not necessary). Used only if nranked = NULL. Defaults to equal probabilities.

Details

Both forms of partial rankings can be obtained into two ways: (i) by specifying, in the nranked argument, the number of positions to be retained in each partial ranking; (ii) by setting nranked = NULL (default) and specifying, in the probs argument, the probabilities of retaining respectively 1, 2, ..., (n-1) positions in the partial rankings (recall that a partial sequence with (n-1) observed entries corresponds to a full ranking).

When topk = FALSE, the exact positions that must be retained into the partial sequences after censoring are uniformly generated, regardless of the specification of the nranked argument.

Value

A list of two named objects:

part_rankings

Integer N\timesn matrix with partial (censored) rankings in each row. Missing positions are coded as NA.

nranked

Integer vector of length N with the actual number of items ranked in each partial sequence after censoring.

Examples


## Example 1. Censoring the Antifragility dataset into partial top rankings
# Top-3 censoring (assigned number of top positions to be retained)
n <- 7
r_antifrag <- ranks_antifragility[, 1:n]
data_censoring(r_antifrag, topk = TRUE, nranked = rep(3,nrow(r_antifrag)))
# Random top-k censoring with assigned probabilities
set.seed(12345)
data_censoring(r_antifrag, topk = TRUE, probs = 1:(n-1))

## Example 2. Simulate full rankings from a basic Mallows model with Spearman distance
n <- 10
N <- 100
set.seed(12345)
rankings <- rMSmix(sample_size = N, n_items = n)$samples
# Censoring in arbitrary positions with assigned number of ranks to be retained
set.seed(12345)
nranked <- round(runif(N,0.5,1)*n)
set.seed(12345)
arbitr_ranks1 <- data_censoring(rankings, topk = FALSE, nranked = nranked)
arbitr_ranks1
identical(arbitr_ranks1$nranked, nranked)
# Censoring in arbitrary positions with random number of ranks to be retained
set.seed(12345)
probs <- runif(n-1, 0, 0.5)
set.seed(12345)
arbitr_ranks2 <- data_censoring(rankings, topk = FALSE, probs = probs)
arbitr_ranks2
prop.table(table(arbitr_ranks2$nranked))
round(prop.table(probs), 2)


MSmix documentation built on April 3, 2025, 9:29 p.m.