MallowsCV: Compute cross-validated likelihood for Mallows mixture models

Description Usage Arguments Value Note Author(s) References See Also Examples

Description

Assess model performance by cross-validated (CV) Mallows likelihood. Do NOT run for large number of ranked alternatives "n".

Usage

1
2
MallowsCV(datas, G, weights = NULL, ..., seed = 26921332, nfolds = 5,
  nrepeats = 10, ntry = 3, logsumexp.trick = TRUE)

Arguments

datas

Matrix of dimension N x n with sequences in rows.

G

Number of modes, 2 or greater.

weights

Integer vector of length N denoting frequencies of each permutation observed. Each observation is observed once by default. Notably it must not contain 0 and should be of equal length with nrow(datas).

...

Arguments passed to Mallows.

seed

Seed index for reproducible results when creating splits of data for CV. Set to NULL to disable the action.

nfolds

nfold-fold CV created each time.

nrepeats

CV repeated nrepeats times.

ntry

Number of random initializations to restart for each CV run. The best fit returning max likelihood is reported.

logsumexp.trick

Logical. Whether or not to use log-sum-exp trick to compute log-likelihood.

Value

List of length nfolds x nrepeats, each entry being the result on each fold containing:

...

See output of Mallows

cv.loglik

Likelihood value assessed against test fold while the mixture model is trained on the training fold

Note

CV split is done by partitioning "weights" so that "weights" must be integers.

Author(s)

Yunlong Jiao

References

Thomas Brendan Murphy, Donal Martin. "Mixtures of distance-based models for ranking data." Computational Statistics & Data Analysis, vol. 41, no. 3, pp. 645-655, 2003. DOI:10.1016/S0167-9473(02)00165-2

Yunlong Jiao, Jean-Philippe Vert. "The Kendall and Mallows Kernels for Permutations." IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 40, no. 7, pp. 1755-1769, 2018. DOI:10.1109/TPAMI.2017.2719680

See Also

Mallows

Examples

1
2
3
4
5
6
7
8
datas <- do.call('rbind', combinat::permn(1:5))
G <- 3
weights <- rbinom(nrow(datas), 100, 0.5) # positive integers

# Cross validate Mallows mixture model
cv.model <- MallowsCV(datas, G, weights, key = 'bordaMallows', nfolds = 3, nrepeats = 1)
# Averaged cv.loglik over all CV folds
mean(sapply(cv.model, function(model) model$cv.loglik))

YunlongJiao/kernrank documentation built on May 10, 2019, 1:13 a.m.