btfit | R Documentation |
btfit
fits the Bradley-Terry model on (potentially) large and sparse datasets.
btfit( btdata, a, MAP_by_component = FALSE, subset = NULL, maxit = 10000, epsilon = 0.001 )
btdata |
An object of class "btdata", typically the result ob of ob <- btdata(..). See |
a |
Must be >= 1. When |
MAP_by_component |
Logical. Only considered if a > 1. Then, if FALSE, the MAP estimate will be found on the full dataset. If TRUE, the MAP estimate will be found separately for each fully-connected component. |
subset |
A condition for selecting a subset of the components. This can either be a character vector of names of the components, a single predicate function (that takes a component as its argument), or a logical vector of the same length as the number of components). |
maxit |
The maximum number of iterations for the algorithm. If returning π by component, this will be the maximum number of iterations for each component. |
epsilon |
Determines when the algorithm is deemed to have converged. (See Details.) |
Let there be K items, let π_k be the Bradley-Terry strength parameter of item k, for k = 1, …, K and let π be the vector of all the π_k. Let w_{ij} be the number of times item i wins against item j, let n_{ij} = w_{ij} + w_{ji} be the number of times they play, with w_{ii} = 0 by convention and let W_i = ∑_{j=1}^K w_{ij}. Then the Bradley-Terry model states that the probability of item i beating item j, p_{ij}, is:
p_{ij} = \frac{π_i}{π_i + π_j}.
The comparison graph, G_W, has the K players as the nodes and a directed edge from node i to node j whenever item i has beaten item j at least once. The MLE of the Bradley-Terry model exists and is finite if and only if the comparison graph is fully-connected (i.e. if there is a directed path from node i to node j for all items i and j).
Assuming that the comparison graph of the data is fully-connected, the MLE of the Bradley-Terry model can be found using the MM-algorithm (Hunter, 2004).
If the comparison graph of the data is not fully-connected, there are two principled options for fitting the Bradley-Terry model. One is to find the MLE within each fully-connected component. The other is to find the Bayesian MAP estimate, as suggested by Caron & Doucet (2012), where a Gamma(a,b) gamma prior is placed on each π_i, and the product of these is taken as a prior on π. The MAP estimate can then be found with an EM-algorithm. When a = 1 and b = 0, the EM and MM-algorithms are equivalent and the MAP estimate and MLE are identical. The rate parameter of the Gamma prior, b, is not likelihood identifiable. When a > 1, b is set to aK - 1, where K is the number of items in the component; this choice of b minimises the number of iterations needed for the algorithm to converge.
The likelihood equations give
a - 1 + W_i = bπ_i + ∑_{j \neq i} \frac{n_{ij}π_i}{π_i + π_j},
for i = 1, …, K. For the algorithm to have converged, we want π to be such that the LHS and RHS of this equation are close for all i. Therefore, we set the convergence criteria as
≤ft|\frac{a - 1 + W_i}{bπ_i + ∑_{j \neq i} \frac{n_{ij}π_i}{π_i + π_j}} - 1\right| < ε,
for all i.
Since the equations do not typeset well within the R help window, we recommend reading this section online: https://ellakaye.github.io/BradleyTerryScalable/reference/btfit.html.
btfit
returns an S3 object of class "btfit". It is a list containing the following components:
call |
The matched call |
pi |
A list of length M, where M is the number of fully-connected components of the comparison graph G_W (or the requested subset) of two or more items. The m-th list item is a named vector π, the strength parameter, for the items in the m-th fully connected component, m = 1, …, M. These are sorted in descending order. |
iters |
A vector of length M. The m-th entry is the number of iterations it took for the algorithm to converge for the m-th component, for m = 1, …, M. Note that if the algorithm has not converged in any component, a warning will be produced. |
converged |
A logical vector of length M, indicating whether the algorithm has converged for the m-th component in |
N |
A list of length M. The m-th list item is a matrix where each element n_{ij} is the number of times item i played against item j, for the items in the m-th component. The rows and columns are arranged in the same order as the ordered pi vector(s). |
diagonal |
A list of length M. The m-th item is a vector of the diagonal elements of the |
names_dimnames |
The names of the dimnames of the original |
Ella Kaye, David Firth
Caron, F. and Doucet, A. (2012) Efficient Bayesian Inference for Generalized Bradley-Terry Models. Journal of Computational and Graphical Statistics, 21(1), 174-196.
Hunter, D. R. (2004) MM Algorithms for Generalized Bradley-Terry Models. The Annals of Statistics, 32(1), 384-406.
btdata
, summary.btfit
, coef.btfit
, fitted.btfit
, btprob
, vcov.btfit
, simulate.btfit
citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) summary(fit1) toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) summary(fit2a) fit2b <- btfit(toy_btdata, 1.1) summary(fit2b) fit2c <- btfit(toy_btdata, 1, subset = function(x) length(x) > 3) summary(fit2c)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.