#' Fit a Plackett-Luce Model
#'
#' Fit a Plackett-Luce model to a set of rankings. The rankings may be partial
#' (each ranking completely ranks a subset of the items) and include ties of
#' arbitrary order.
#'
#' @section Model definition:
#'
#' A single ranking is given by
#' \deqn{R = \{C_1, C_2, \ldots, C_J\}}{R = {C_1, C_2, \ldots, C_J}}
#' where the items in set \eqn{C_1} are ranked higher than (better than) the
#' items in \eqn{C_2}, and so on. If there are multiple objects in set \eqn{C_j}
#' these items are tied in the ranking.
#'
#' For a set if items \eqn{S}, let
#' \deqn{f(S) = \delta_{|S|}
#' \left(\prod_{i \in S} \alpha_i \right)^\frac{1}{|S|}}{
#' f(S) = delta_{|S|} * (prod_{i in S} alpha_i)^(1/|S|)}
#' where \eqn{|S|} is the cardinality (size) of the set, \eqn{\delta_n}{delta_n}
#' is a parameter related to the prevalence of ties of order \eqn{n}
#' (with \eqn{\delta_1 \equiv 1}), and \eqn{\alpha_i}{alpha_i} is a
#' parameter representing the worth of item \eqn{i}.
#' Then under an extension of the Plackett-Luce model allowing ties up to order
#' \eqn{D}, the probability of the ranking \eqn{R} is given by
#' \deqn{\prod_{j = 1}^J \frac{f(C_j)}{
#' \sum_{k = 1}^{\min(D_j, D)} \sum_{S \in {A_j \choose k}} f(S)}}{
#' prod_{j = 1}^J f(C_j)/
#' (sum_{k = 1}^{min(D_j, D)} sum_{S in choose(A_j, k)} f(S))}
#' where \eqn{D_j} is the cardinality of \eqn{A_j}, the set of
#' alternatives from which \eqn{C_j} is chosen, and
#' \eqn{A_j \choose k}{choose(A_j, k)} is all the possible choices of \eqn{k}
#' items from \eqn{A_j}. The value of \eqn{D} can be set to the maximum number
#' of tied items observed in the data, so that \eqn{\delta_n = 0}{delta_n = 0}
#' for \eqn{n > D}.
#'
#' When the worth parameters are constrained to sum to one, they represent the
#' probability that the corresponding item comes first in a ranking of all
#' items, given that first place is not tied.
#'
#' The 2-way tie prevalence parameter \eqn{\delta_2}{delta_2} is related to
#' the probability that two items \emph{of equal worth} tie for
#' first place, given that the first place is not a 3-way or higher tie.
#' Specifically, that probability is
#' \eqn{\delta_2/(2 + \delta_2)}{delta_2/(2 + delta_2}.
#'
#' The 3-way and higher tie-prevalence parameters are similarly interpretable,
#' in terms of tie probabilities among equal-worth items.
#'
#' When intermediate tie orders are not observed (e.g. ties of order 2
#' and order 4 are observed, but no ties of order 3), the maximum
#' likelihood estimate of the corresponding tie prevalence parameters
#' is zero, so these parameters are excluded from the model.
#'
#' @section Pseudo-rankings:
#'
#' In order for the maximum likelihood estimate of an object's worth to be
#' defined, the network of rankings must be strongly connected. This means that
#' in every possible partition of the objects into two nonempty subsets, some
#' object in the second set is ranked higher than some object in the first set
#' at least once.
#'
#' If the network of rankings is not strongly connected then pseudo-rankings
#' may be used to connect the network. This approach posits a hypothetical
#' object with log-worth 0 and adds \code{npseudo} wins and \code{npseudo}
#' losses to the set of rankings.
#'
#' The parameter \code{npseudo} is the prior strength. With \code{npseudo = 0}
#' the MLE is the posterior mode. As \code{npseudo} approaches
#' infinity the log-worth estimates all shrink towards 0. The default,
#' \code{npseudo = 0.5}, is sufficient to connect the network and has a weak
#' shrinkage effect. Even for networks that are already connected, adding
#' pseudo-rankings typically reduces both the bias and variance of the
#' estimators of the worth parameters.
#'
#' @section Incorporating prior information on log-worths:
#'
#' Prior information can be incorporated by using \code{normal} to specify a
#' multivariate normal prior on the \emph{log}-worths. The log-worths are then
#' estimated by maximum a posteriori (MAP) estimation. Model summaries
#' (deviance, AIC, standard errors) are based on the log-likelihood evaluated
#' at the MAP estimates, resulting in a finite sample bias that should
#' disappear as the number of rankings increases. Inference based on these
#' model summaries is valid as long as the prior is considered fixed and not
#' tuned as part of the model.
#'
#' Incorporating a prior is an alternative method of penalization, therefore
#' \code{npseudo} is set to zero when a prior is specified.
#'
#' @section Incorporating ranker adherence parameters:
#'
#' When rankings come from different rankers, the model can be extended to
#' allow for varying reliability of the rankers, as proposed by Raman and
#' Joachims (2014). In particular, replacing \eqn{f(S)} by
#' \deqn{h(S) = \delta_{|S|}
#' \left(\prod_{i \in S} \alpha_i \right)^\frac{\eta_g}{|S|}}{
#' h(S) = delta_{|S|} * (prod_{i in S} alpha_i)^(eta_g/|S|)}
#' where \eqn{\eta_g > 0}{eta_g > 0} is the adherence parameter for ranker
#' \eqn{g}. In the standard model, all rankers are assumed to have equal
#' reliability, so \eqn{\eta_g = 1}{eta_g = 1} for all rankers.
#' Higher \eqn{\eta_g = 1}{eta_g = 1} increases the distance between item
#' worths, giving greater weight' to the ranker's choice. Conversely, lower
#' \eqn{\eta_g = 1}{eta_g = 1} shrinks the item worths towards equality so the
#' ranker's choice is less relevant.
#'
#' The adherence parameters are not estimable by maximum likelihood, since
#' for given item worths the maximum likelihood estimate of adherence would be
#' infinity for rankers that give rankings consistent with the items ordered by
#' worth and zero for all other rankers. Therefore it is essential to include a
#' prior on the adherence parameters when these are estimated rather than fixed.
#' Setting \code{gamma = TRUE} specifies the default
#' \eqn{\Gamma(10,10)}{Gamma(10,10)} prior, which has a mean of
#' 1 and a probability of 0.99 that the adherence is between 0.37 and 2.
#' Alternative parameters can be specified by a list with elements \code{shape}
#' and \code{rate}. Setting scale and rate to a common value \eqn{\theta}{theta}
#' specifies a mean of 1; \eqn{\theta \ge}{theta >=} 2 will give low prior
#' probability to near-zero adherence; as \eqn{\theta}{theta} increases the
#' density becomes more concentrated (and more symmetrical) about 1.
#'
#' Since the number of adherence parameters will typically be large and it is
#' assumed the worth and tie parameters are of primary interest, the adherence
#' parameters are not included in model summaries, but are included in the
#' returned object.
#'
#' @section Controlling the fit:
#'
#' For models without priors, using \code{nspseudo = 0} will use standard
#' maximum likelihood, if the network is connected (and throw an error
#' otherwise).
#'
#' The fitting algorithm is set by the \code{method} argument. The default
#' method \code{"iterative scaling"} is a slow but reliable approach. In
#' addition, this has the most control on the accuracy of the final fit, since
#' convergence is determined by direct comparison of the observed and expected
#' values of the sufficient statistics for the worth parameters, rather than a
#' tolerance on change in the log-likelihood.
#'
#' The \code{"iterative scaling"} algorithm is slow because it is a first order
#' method (does not use derivatives of the likelihood). From a set of starting
#' values that are 'close enough' to the final solution, the algorithm can be
#' accelerated using
#' \href{https://en.wikipedia.org/wiki/Steffensen's_method}{Steffensen's method}.
#' \code{PlackettLuce} attempts to apply Steffensen's acceleration when all
#' differences between the observed and expected values of the sufficient
#' statistics are less than \code{steffensen}. This is an ad-hoc rule defining
#' 'close enough' and in some cases the acceleration may produce negative
#' worth parameters or decrease the log-likelihood. \code{PlackettLuce} will
#' only apply the update when it makes an improvement.
#'
#' The \code{"BFGS"} and \code{"L-BFGS"} algorithms are second order methods,
#' therefore can be quicker than the default method. Control parameters can be
#' passed on to \code{\link[stats]{optim}} or \code{\link[lbfgs]{lbfgs}}.
#'
#' For models with priors, the iterative scaling method cannot be used, so BFGS
#' is used by default.
#'
#' @seealso
#'
#' Handling rankings: \code{\link{rankings}}, \code{\link{aggregate}},
#' \code{\link{group}}, \code{\link{choices}},
#' \code{\link{adjacency}}, \code{\link{connectivity}}.
#'
#' Inspect fitted Plackett-Luce models: \code{\link{coef}}, \code{deviance},
#' \code{\link{fitted}}, \code{\link{itempar}}, \code{logLik}, \code{print},
#' \code{\link{qvcalc}}, \code{\link{summary}}, \code{\link{vcov}}.
#'
#' Fit Plackett-Luce tree: \code{pltree}.
#'
#' Example data sets: \code{\link{beans}}, \code{\link{nascar}},
#' \code{\link{pudding}}, \code{\link{preflib}}.
#'
#' Vignette: \code{vignette("Overview", package = "PlackettLuce")}.
#'
#' @note As the maximum tie order increases, the number of possible choices for
#' each rank increases rapidly, particularly when the total number of items is
#' high. This means that the model will be slower to fit with higher \eqn{D}.
#' In addition, due to the current implementation of the `vcov()` method,
#' computation of the standard errors (as by `summary()`) can take almost as
#' long as the model fit and may even become infeasible due to memory limits.
#' As a rule of thumb, for > 10 items and > 1000 rankings, we recommend
#' `PlackettLuce()` for ties up to order 4. For higher order ties, a
#' rank-ordered logit model, see [ROlogit::rologit()] or
#' generalized Mallows Model as in [BayesMallows::compute_mallows()] may be
#' more suitable, as they do not model tied events explicitly.
#'
#' @param rankings a \code{"\link{rankings}"} object, or an object that can be
#' coerced by \code{as.rankings}. An [`"aggregated_rankings"`][aggregate()]
#' object can be used to specify rankings and weights simultaneously.
#' A \code{"\link{grouped_rankings}"} object should be used when estimating
#' adherence for rankers with multiple rankings per ranker.
#' @param npseudo when using pseudodata: the number of wins and losses to add
#' between each object and a hypothetical reference object.
#' @param normal a optional list with elements named \code{mu} and \code{Sigma}
#' specifying the mean and covariance matrix of a multivariate normal prior on
#' the \emph{log} worths.
#' @param gamma a optional list with elements named \code{shape} and \code{rate}
#' specifying parameters of a gamma prior on adherence parameters for each
#' ranker (use \code{grouped_rankings} to group multiple rankings by ranker).
#' The short-cut \code{TRUE} may be used to specify a Gamma(10, 10) prior. If
#' \code{NULL} (or \code{FALSE}), adherence is fixed to \code{adherence} for
#' all rankers.
#' @param adherence an optional vector of adherence values for each ranker. If
#' missing, adherence is fixed to 1 for all rankers. If \code{gamma} is not
#' \code{NULL}, this specifies the starting values for the adherence.
#' @param weights an optional vector of weights for each ranking.
#' @param na.action a function to handle any missing rankings, see
#' [na.omit()].
#' @param start starting values for the worth parameters and the tie parameters
#' on the raw scale (worth parameters need not be scaled to sum to 1). If
#' \code{normal} is specified, \code{exp(normal$mu)} is used as starting values
#' for the worth parameters. Coefficients from a previous fit can be passed as
#' the result of a call to \code{coef.PlackettLuce}, or the \code{coefficients}
#' element of a \code{"PlackettLuce"} object.
#' @param method the method to be used for fitting: \code{"iterative scaling"}
#' (iterative scaling to sequentially update the parameter values),
#' \code{"BFGS"} (the BFGS optimisation algorithm through the
#' \code{\link{optim}} interface), \code{"L-BFGS"} (the limited-memory BFGS
#' optimisation algorithm as implemented in the \code{\link[lbfgs]{lbfgs}}
#' package). Iterative scaling is used by default, unless a prior is specified
#' by \code{normal} or \code{gamma}, in which case the default is \code{"BFGS"}.
#' @param epsilon the maximum absolute difference between the observed and
#' expected sufficient statistics for the ability parameters at convergence.
#' @param steffensen a threshold defined as for \code{epsilon} after which to
#' apply Steffensen acceleration to the iterative scaling updates.
#' @param maxit a vector specifying the maximum number of iterations. If
#' \code{gamma} is \code{NULL}, only the first element is used and specifies the
#' maximum number of iterations of the algorithm specified by \code{method}. If
#' \code{gamma} is not \code{NULL}, a second element may be supplied to specify
#' the maximum number of iterations of an alternating algorithm, where
#' the adherence parameters are updated alternately with the other parameters.
#' The default is to use 10 outer iterations.
#' @param trace logical, if \code{TRUE} show trace of iterations.
#' @param verbose logical, if \code{TRUE} show messages from validity checks on
#' the rankings.
#' @param ... additional arguments passed to \code{optim} or \code{lbfgs}.
#' In particular the convergence tolerance may be adjusted using e.g.
#' \code{control = list(reltol = 1e-10)}.
#'
#' @return An object of class \code{"PlackettLuce"}, which is a list containing
#' the following elements:
#' \item{call}{ The matched call. }
#' \item{coefficients}{ The model coefficients. }
#' \item{loglik}{ The maximized log-likelihood. }
#' \item{null.loglik}{ The maximized log-likelihood for the null model (all
#' alternatives including ties have equal probability). }
#' \item{df.residual}{ The residual degrees of freedom. }
#' \item{df.null}{ The residual degrees of freedom for the null model. }
#' \item{rank}{ The rank of the model. }
#' \item{logposterior}{ If a prior was specified, the maximised log posterior.}
#' \item{gamma}{ If a gamma prior was specified, the list of parameters. }
#' \item{normal}{ If a normal prior was specified, the list of parameters. }
#' \item{iter}{ The number of iterations run. }
#' \item{rankings}{ The rankings passed to \code{rankings}, converted to a
#' \code{"rankings"} object if necessary. }
#' \item{weights}{ The weights applied to each ranking in the fitting. }
#' \item{adherence}{ The fixed or estimated adherence per ranker. }
#' \item{ranker}{ The ranker index mapping rankings to rankers (the
#' \code{"index"} attribute of \code{rankings} if specified as a
#' \code{"grouped_rankings"} object.)}
#' \item{ties}{ The observed tie orders corresponding to the estimated tie
#' parameters. }
#' \item{conv}{ The convergence code: 0 for successful convergence; 1 if reached
#' \code{maxit} (outer) iterations without convergence; 2 if Steffensen
#' acceleration cause log-likelihood to increase; negative number if L-BFGS
#' algorithm failed for other reason.}
#' @references
#' Raman, K. and Joachims, T. (2014) Methods for Ordinal Peer Grading.
#' \href{https://arxiv.org/abs/1404.3656}{arXiv:1404.3656}.
#' @examples
#' # Six partial rankings of four objects, 1 is top rank, e.g
#' # first ranking: item 1, item 2
#' # second ranking: item 2, item 3, item 4, item 1
#' # third ranking: items 2, 3, 4 tie for first place, item 1 second
#' R <- matrix(c(1, 2, 0, 0,
#' 4, 1, 2, 3,
#' 2, 1, 1, 1,
#' 1, 2, 3, 0,
#' 2, 1, 1, 0,
#' 1, 0, 3, 2), nrow = 6, byrow = TRUE)
#' colnames(R) <- c("apple", "banana", "orange", "pear")
#'
#' # create rankings object
#' R <- as.rankings(R)
#'
#' # Standard maximum likelihood estimates
#' mod_mle <- PlackettLuce(R, npseudo = 0)
#' coef(mod_mle)
#'
#' # Fit with default settings
#' mod <- PlackettLuce(R)
#' # log-worths are shrunk towards zero
#' coef(mod)
#'
#' # independent N(0, 9) priors on log-worths, as in Raman and Joachims
#' prior <- list(mu = rep(0, ncol(R)),
#' Sigma = diag(rep(9, ncol(R))))
#' mod_normal <- PlackettLuce(rankings = R, normal = prior)
#' # slightly weaker shrinkage effect vs pseudo-rankings,
#' # with less effect on tie parameters (but note small number of rankings here)
#' coef(mod_normal)
#'
#' # estimate adherence assuming every ranking is from a separate ranker
#' mod_separate <- PlackettLuce(rankings = R, normal = prior, gamma = TRUE)
#' coef(mod_separate)
#' # gives more weight to rankers 4 & 6 which rank apple first,
#' # so worth of apple increased relative to banana
#' mod_separate$adherence
#'
#' # estimate adherence based on grouped rankings
#' # - assume two rankings from each ranker
#' G <- group(R, rep(1:3, each = 2))
#' mod_grouped <- PlackettLuce(rankings = G, normal = prior, gamma = TRUE)
#' coef(mod_grouped)
#' # first ranker is least consistent so down-weighted
#' mod_grouped$adherence
#'
#' @importFrom igraph as_adj graph_from_edgelist
#' @importFrom RSpectra eigs
#' @importFrom stats optim
#' @export
PlackettLuce <- function(rankings,
npseudo = 0.5,
normal = NULL,
gamma = NULL,
adherence = NULL,
weights = freq(rankings),
na.action = getOption("na.action"),
start = NULL,
method = c("iterative scaling", "BFGS", "L-BFGS"),
epsilon = 1e-7, steffensen = 0.1, maxit = c(500, 10),
trace = FALSE, verbose = TRUE, ...){
call <- match.call()
# check rankings object
grouped_rankings <- inherits(rankings, "grouped_rankings")
if (inherits(rankings, "aggregated_rankings")){
force(weights)
rankings$freq <- NULL
rankings <- as.rankings(rankings)
}
if (!grouped_rankings & !inherits(rankings, "rankings")){
rankings <- suppressWarnings(as.rankings(rankings, verbose = verbose))
}
# define ranker
if (grouped_rankings){
ranker <- attr(rankings, "index")
# if weights are per group id expand to be per ranking
if (!is.null(weights) & length(weights) == max(ranker)) {
weights <- weights[ranker]
}
}else if (!is.null(adherence)| !is.null(gamma)){
ranker <- seq_len(nrow(rankings))
} else ranker <- NULL
if (!is.null(na.action)) {
rankings <- match.fun(na.action)(rankings)
na.action <- attr(rankings, "na.action")
if (!is.null(na.action)) {
if (!is.null(weights)) weights <- weights[-na.action]
if (!is.null(adherence)) adherence <- adherence[-na.action]
if (!is.null(ranker)) ranker <- ranker[-na.action]
}
}
# unpack grouped rankings
if (grouped_rankings){
R <- attr(rankings, "R")
S <- attr(rankings, "S")
id <- attr(rankings, "id")
rankings <- attr(rankings, "rankings")
}
# attributes
items <- colnames(rankings)
N <- ncol(rankings) # total number of objects
nr <- nrow(rankings) # number of rankings
# weights
if (is.null(weights)){
weights <- rep.int(1L, nr)
}
stopifnot(length(weights) == nr)
stopifnot(all(weights > 0L))
# check gamma prior specification
if (is.logical(gamma) && length(gamma) == 1L && !is.na(gamma) && !gamma)
gamma <- NULL # in case specified as FALSE
if (!is.null(gamma)) { # specified or TRUE
if (isTRUE(gamma)) gamma <- list(shape = 10, rate = 10)
stopifnot(names(gamma) == c("shape", "rate"))
stopifnot(all(lengths(gamma) == 1L))
if (gamma$shape != gamma$rate)
warning("mean of adherence prior is not 1")
}
# adherence should be per ranker
if (!is.null(adherence)) {
stopifnot(length(adherence) == max(ranker))
stopifnot(all(adherence > 0L))
a <- adherence[ranker]
} else if (!is.null(gamma)){
adherence <- rep.int(1L, max(ranker))
a <- adherence[ranker]
wa <- rowsum(weights, ranker)[,1]
} else a <- NULL
if (!grouped_rankings){
# items ranked from last to 1st place
R <- t(apply(rankings, 1L, order, decreasing = TRUE))
mode(R) <-"integer"
# adjacency matrix: wins over rest
# N.B. need even if `start` specified; used to check connectivity
X <- adjacency(rankings, weights = weights)
# sizes of selected sets
set <- apply(rankings, 1L, function(x){
last <- which(x == max(x))
ind <- which(x > 0L)
# exclude untied last place
if (length(last) == 1L) ind <- setdiff(ind, last)
list(size = tabulate(x[ind])[x[ind]], item_id = ind)
})
item_id <- unlist(lapply(set, `[[`, "item_id"))
S <- lapply(set, `[[`, "size")
ns <- lengths(S)
ws <- rep.int(weights, ns)
if (!is.null(adherence)) ranker_id <- rep.int(ranker, ns)
S <- unlist(S)
rm(set)
} else {
# adjacency matrix: wins over rest
# (id extracted from grouped_rankings object)
X <- matrix(0.0, N, N)
X[sort(unique(unlist(id)))] <-
rowsum(rep(weights, lengths(id)), unlist(id))
class(X) <- c("adjacency", "matrix")
# (S extracted from grouped_rankings object)
# N.B. here unlist indifferent order than ungrouped
ws <- (weights * as.logical(S))[as.logical(S)]
if (!is.null(adherence)) {
ranker_id <- (ranker * as.logical(S))[as.logical(S)]
}
item_id <- R[as.logical(S)]
S <- S[as.logical(S)]
}
# sufficient statistics
# for delta d, (number of sets with cardinality d)/cardinality
d <- sort(unique(S))
D <- length(d)
B <- rowsum(ws, S)[,1L]
B <- B/d
# from now only only need weight/size per set, so replace S with this
S <- ws/S
rm(ws)
# for alpha
A <- numeric(N)
item <- sort(unique(item_id))
if (is.null(adherence)){
# sum over all sets st object i is in selected set weight/size
A[item] <- unname(rowsum(S, item_id)[,1L])
} else {
# now (adherence * weight)/size
A[item] <- unname(rowsum(adherence[ranker_id]*S, item_id)[,1L])
}
if (!is.null(normal)){
# set npseudo to 0
npseudo <- 0L
# check normal prior specification
stopifnot(names(normal) == c("mu", "Sigma"))
if (length(normal$mu) != N)
stop("`length(normal$mu)` is not equal to the number of items")
if (!identical(dim(normal$Sigma), c(N, N)))
stop("`normal$Sigma` is not a square matrix with number of rows ",
"equal to the number of items")
# compute inverse of variance covariance matrix (used in score function)
Rinv <- solve(chol(normal$Sigma))
Kinv <- Rinv %*% t(Rinv)
} else Rinv <- Kinv <- NULL
# check connectivity if npseudo = 0 and normal prior not specified
if (npseudo == 0L & is.null(normal)){
out <- connectivity(X, verbose = FALSE)
if (out$no > 1L)
stop("Network is not fully connected - cannot estimate all ",
"item parameters with npseudo = 0")
}
# unique rows with >= 1 element for logical matrix
# case where p is fixed
uniquerow <- function(M){
len <- ncol(M)
if (len == 1L) return(1L)
pattern <- M[,1L]
max_exp <- .Machine$double.digits
for (k in seq_len(len - 1L)){
# N.B. can't use integer here, can get integer overflow
pattern <- 2*pattern + M[,k]
if (k == len - 1L ||
k %% max_exp == (max_exp - 1L)){
pattern <- match(pattern, unique(pattern))
max_exp <- .Machine$double.digits - ceiling(log2(max(pattern)))
}
}
as.integer(pattern)
}
# max number of objects in set
nc <- max(rowSums(rankings > 0L))
# create W so cols are potential set sizes and value > 0 indicates ranking
# aggregate common sets
# - incorporate ranking weights by weighted count vs simple tabulate
W <- G <- list() # weight (currently rep count); group of rankings
P <- logical(nc) # set sizes present in rankings
minrank <- rep.int(1L, nr)
for (i in seq_len(nc)){
p <- (nc - i + 1L)
set <- rankings >= minrank
r <- which(rowSums(set) == p)
P[p] <- p != 1L && length(r)
if (!P[p]) next
# separate adherence per ranking
if (!is.null(gamma) & !grouped_rankings){
W[[p]] <- weights[r]
G[[p]] <- r
} else {
g <- uniquerow(set[r, , drop = FALSE])
# combine within ranker (adherence likely different for each ranker)
if (!is.null(adherence)){
x <- ranker[r]/10^ceiling(log10(max(ranker[r]))) + g
g <- match(x, x)
}
W[[p]] <- as.vector(unname(rowsum(weights[r], g)))
G[[p]] <- r[!duplicated(g)]
}
minrank[r] <- minrank[r] + 1L
}
P <- which(P)
# if npseudo > 0 add npseudo wins and losses with hypothetical item
stopifnot(npseudo >= 0L)
if (npseudo > 0L){
# update R with paired comparisons with hypothetical item (item N + 1)
R <- cbind(R, "NULL" = 0L)
pseudo <- matrix(0L, nrow = N, ncol = N + 1L)
pseudo[, 1L] <- N + 1L
pseudo[, 2L] <- seq_len(N)
R <- rbind(R, pseudo)
# update X with npseudo wins and losses with hypothetical item
X <- cbind(X, npseudo)
X <- rbind(X, c(rep.int(npseudo, N), 0L))
# update adherence: set to 1 for ghost rater
if (!is.null(adherence)) a <- c(a, rep.int(1L, N))
# update weights: 2*npseudo comparisons of each pair
W[[2L]] <- c(W[[2L]], rep.int(2L*npseudo, N))
# update indices
G[[2L]] <- c(G[[2L]], (nr + 1L):(nr + N))
# add set size of 2 to observed set sizes, if necessary
P <- unique(c(2L, P))
# update A: npseudo wins for item N + 1 against each other item;
# npseudo wins for other items
A <- c(A + npseudo, N*npseudo)
# update B: 2*npseudo untied choices per item
B[1L] <- B[1L] + 2L*npseudo*N
}
if (is.null(start)){
if (!is.null(normal)) {
alpha <- exp(normal$mu)
} else {
# (scaled, un-damped) PageRank on underlying paired comparisons
ncv <- min(nrow(X), 10L)
alpha <- drop(abs(eigs(X/colSums(X), 1L,
opts = list(ncv = ncv))$vectors))
}
delta <- c(1.0, rep.int(0.1, D - 1L))
} else {
# if not "coef.PlackettLuce" object, assume on raw scale
# (with ability of hypothetical item 1), i.e. as returned coefficients)
if (!length(start) == N + D - 1)
stop(paste("`start` does not specify enough values for", N, "item",
paste("and", D - 1, "tie parameters")[D > 1]))
alpha <- start[seq_len(N)]
delta <- start[-seq_len(N)]
if (inherits(start, "coef.PlackettLuce")){
# reverse identifiability constraints and put on raw scale
if (attr(start, "log")){
alpha <- exp(alpha + attr(start, "const"))
delta <- exp(delta)
} else {
alpha <- alpha*attr(start, "const")
}
}
delta <- c(1.0, delta)
if (npseudo > 0L) {
# set alpha for hypothetical item to 1 as in original fit
if (inherits(start, "coef.PlackettLuce")){
alpha <- c(alpha, 1.0)
} else {
# set to geometric mean as neutral position
alpha <- c(alpha, exp(mean(log(alpha))))
}
}
}
if (npseudo) {
N <- N + 1L
alpha <- alpha/alpha[N]
}
method <- match.arg(method, c("iterative scaling", "BFGS", "L-BFGS"))
if ((!is.null(normal) | !is.null(gamma)) && method == "iterative scaling"){
if (method %in% names(call)){
stop("when a prior is specified, `method` cannot be ",
"`\"iterative scaling\"`")
} else method <- "BFGS"
}
if (method == "BFGS"){
opt <- function(par, obj, gr, ...){
dts <- list(...)
do.call("optim",
c(list(par, obj, gr, method = "BFGS",
control = c(eval(dts$control),
list(maxit = maxit[1L]))),
dts[setdiff(names(dts), "control")]))
}
}
if (method == "L-BFGS"){
if (trace) message("No trace implemented for L-BFGS")
opt <- function(par, obj, gr, ...){
lbfgs::lbfgs(obj, gr, par, invisible = 1L,
max_iterations = maxit[1L], ...)
}
}
# Optimization of log-worths and log-tie parameters via
obj_common <- function(par) {
alpha <- exp(par[1L:N])
delta <- exp(par[-c(1L:N)])
# assign to parent environment so can use further quantities in score
assign("fit", expectation("all", alpha, c(1.0, delta),
a, N, d, P, R, G, W),
envir = parent.env(environment()))
-loglik_common(c(alpha, delta), N, normal$mu, Rinv, A, B, fit)
}
gr_common <- function(par) {
alpha <- exp(par[1L:N])
delta <- exp(par[-c(1L:N)])
-score_common(c(alpha, delta), N, normal$mu, Kinv, A, B, fit) *
c(alpha, delta)
}
# Optimization of log-adherence (for non-ghost raters) via
obj_adherence <- function(par){
adherence <- exp(par)
# assign to parent environment so can use further quantities in score
assign("fit", normalization(alpha, c(1.0, delta),
adherence[ranker], d, P, R, G, W),
envir = parent.env(environment()))
-loglik_adherence(adherence, gamma$shape, gamma$rate, wa, Z, fit)
}
gr_adherence <- function(par) {
adherence <- exp(par)
-score_adherence(adherence, ranker, gamma$shape, gamma$rate,
wa, Z, fit) * adherence
}
logp <- NULL
if (method != "iterative scaling"){
i <- 0L
conv <- 1L
if (!is.null(gamma)){
mode(maxit) <- "integer"
if (length(maxit) != 2L) maxit <- c(maxit, 10L)
}
repeat{
# fit model with fixed adherence (a)
res <- opt(log(c(alpha, delta[-1L])), obj_common, gr_common, ...)
if (i == maxit[2L] | is.null(gamma)) break
if (!is.null(gamma)){
# update alpha, delta & sufficient statistics for adherence
alpha <- exp(res$par[1L:N])
delta <- c(1.0, exp(res$par[-(1L:N)]))
Z <- unname(rowsum(S*log(alpha[item_id]), ranker_id)[,1L])
if (i > 0L) logp_old <- logp
# log-posterior after worth update
logp <- -res$value + sum(wa*((gamma$shape - 1L)*log(adherence) -
gamma$rate*adherence))
if (i > 0L && abs(logp_old - logp) <=
epsilon * (abs(logp) + epsilon)) {
conv <- 0L
break
}
i <- i + 1L
# fit model with fixed worth/tie parameters
# ignore rankings involving ghost item (adherence fixed to 1)
res2 <- opt(log(adherence), obj_adherence, gr_adherence, ...)
# update adherence & sufficient stats for alpha (real items)
adherence <- exp(res2$par)
a[1L:nr] <- adherence[ranker]
A[item] <- unname(rowsum(adherence[ranker_id]*S, item_id)[,1L])
if (npseudo) A[item] <- A[item] + npseudo
# log-posterior after gamma update
# -res2$value + sum(B[-1]*log(delta)) -
# 0.5*tcrossprod((log(alpha) - normal$mu) %*% Rinv)[1]
}
}
if (!is.null(gamma)){
iter <- i
} else {
conv <- res$convergence
iter <- res$counts # NULL for L-BFGS
}
res <- list(alpha = exp(res$par[1L:N]),
delta = c(1.0, exp(res$par[-(1L:N)])),
logl = -res$value)
} else {
res <- list(alpha = alpha, delta = delta)
res[c("expA", "expB", "theta")] <-
expectation("all", alpha, delta, a, N, d, P, R, G, W)
res$logl <- sum(B[-1L]*log(res$delta)[-1L]) + sum(A*log(res$alpha)) -
sum(res$theta)
oneUpdate <- function(res){
# update all alphas
res$alpha <- res$alpha*A/res$expA
if (npseudo) res$alpha <- res$alpha/res$alpha[N]
# update all deltas
if (D > 1L) {
res$delta[-1L] <- B[-1L]/
expectation("delta", res$alpha, res$delta,
a, N, d, P, R, G, W)$expB
}
res[c("expA", "expB", "theta")] <-
expectation("all", res$alpha, res$delta,
a, N, d, P, R, G, W)
res$logl <- sum(B[-1L]*log(res$delta)[-1L]) +
sum(A*log(res$alpha)) - sum(res$theta)
res
}
accelerate <- function(p, p1, p2){
# only accelerate if parameter has changed in last iteration
z <- p2 != p1
p2[z] <- p[z] - (p1[z] - p[z])^2L / (p2[z] - 2L * p1[z] + p[z])
p2
}
# stopping rule: compare observed & expected sufficient stats
checkConv <- function(res){
eps <- abs(c(A, B[-1L]) - c(res$expA, res$delta[-1L]*res$expB))
assign("eps", eps, envir = parent.env(environment()))
ifelse(all(eps < epsilon), 0L, 1L)
}
iter <- 0L
if ((conv <- checkConv(res)) == 0L) maxit <- 0L
updateIter <- function(res){
if (trace) message("iter ", iter, ", loglik: ", res$logl)
assign("iter", iter + 1L,
envir = parent.env(environment()))
}
eps <- c(A, B[-1L])
doSteffensen <- FALSE
while(iter < maxit[1L]){
updateIter(res)
res <- oneUpdate(res)
if ((conv <- checkConv(res)) == 0L) break
if (all(eps < steffensen & !doSteffensen)) doSteffensen <- TRUE
# steffensen
if (doSteffensen){
res1 <- oneUpdate(res)
if ((conv <- checkConv(res1)) == 0L) {
res <- res1
break
}
res2 <- oneUpdate(res1)
if ((conv <- checkConv(res2)) == 0L) {
res <- res2
break
}
# if negative worth or log-likelihood decreased,
# don't apply Steffensen
res$alpha <-
accelerate(res$alpha, res1$alpha, res2$alpha)
if (all(res$alpha > 0L)) {
res$delta[-1L] <-
accelerate(res$delta, res1$delta, res2$delta)[-1L]
res[c("expA", "expB", "theta")] <-
expectation("all", res$alpha, res$delta,
a, N, d, P, R, G, W)
res$logl <-
sum(B[-1L]*log(res$delta)[-1L]) +
sum(A*log(res$alpha)) - sum(res$theta)
if (res$logl < res2$logl) {
res <- res2
} else if ((conv <- checkConv(res)) == 0L) break
} else res <- res2
}
}
if (trace) message("iter ", iter, ", loglik: ", res$logl)
}
if (conv[1L] == 1L) warning("Iterations have not converged.")
res$delta <- structure(res$delta, names = paste0("tie", names(B)))[-1L]
if (npseudo > 0L) {
# drop hypothetical object
res$alpha <- res$alpha[-N]
N <- N - 1L
# drop weights and indices for pseudodata
i <- seq_len(length(W[[2L]]) - N)
W[[2L]] <- W[[2L]][i]
G[[2L]] <- G[[2L]][i]
if (!length(W[[2L]])) P <- setdiff(P, 2L)
# remove contribution to A and B
A <- A[-(N + 1L)] - npseudo
B[1L] <- B[1L] - 2L*npseudo*N
}
names(res$alpha) <- items
rank <- N + D - 2L
cf <- c(res$alpha, res$delta)
# save normal prior info
if (!is.null(normal)) {
normal_prior <- list(mu = normal$mu, invSigma = Kinv)
} else normal_prior <- NULL
# recompute log-likelihood excluding pseudo-observations/priors
if (npseudo > 0L | !is.null(normal) | !is.null(gamma)) {
if (!is.null(normal) & is.null(gamma)) {
# logp not yet assigned
logp <- res$logl
}
normal <- NULL
logl <- -obj_common(log(cf))
} else logl <- res$logl
# null log-likelihood - set adherence to 1 if estimated
if (!is.null(gamma)) a <- rep.int(1L, length(a))
null.loglik <- -obj_common(rep.int(0L, length(cf)))
# frequencies of sets selected from, for sizes 2 to max observed
freq <- vapply(W[P], sum, 1.0)
# number of possible selections overall
n <- sum(vapply(P, choose, numeric(D), k = d) %*% freq)
df.residual <- n - sum(freq) - rank
fit <- list(call = call,
coefficients = cf,
loglik = logl,
null.loglik = null.loglik,
df.residual = df.residual,
df.null = n - sum(freq), #naming consistent with glm
rank = rank,
logposterior = logp,
gamma = gamma,
normal = normal_prior,
iter = iter,
rankings = rankings,
weights = weights,
adherence = adherence,
ranker = ranker,
ties = d,
conv = conv,
na.action = na.action)
class(fit) <- "PlackettLuce"
fit
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.