The bpcs
package performs Bayesian estimation of Paired Comparison
models utilizing Stan, such as variations of the Bradley-Terry (Bradley
and Terry 1952) and the Davidson models (Davidson 1970).
Package documentation and vignette articles can be found at: https://davidissamattos.github.io/bpcs/
For the bpcs
package to work, we rely upon the Stan software and the
rstan
package (Stan Development Team 2020).
To install the development version of the bpcs package, install directly from the Github repository.
remotes::install_github('davidissamattos/bpcs')
After installing, we load the package with:
library(bpcs)
The main function of the package is the bpc
function. For the simple
Bradley-Terry model, this function requires a specific type of data
frame that contains:
We will utilize the tennis dataset available (Agresti 2003). The dataset
can be seen below and is available as data(tennis_agresti)
:
dplyr::sample_n(tennis_agresti,10) %>%
knitr::kable()
player0
player1
y
id
Seles
Sanchez
0
14
Sabatini
Sanchez
1
42
Seles
Graf
0
1
Graf
Sabatini
1
22
Sabatini
Sanchez
1
41
Seles
Navratilova
1
12
Graf
Sanchez
0
32
Sabatini
Navratilova
0
35
Sabatini
Sanchez
0
39
Seles
Graf
1
4
Based on the scores of each contestant, the bpc
function computes
automatically who won the contest. Alternatively, you can provide a
vector of who won if that is already available (for more information see
?bpc
.
For the simple Bradley Terry Model we specify the model type as 'bt'
.
Here we hide the MCMC sampler chain messages for simplicity in the
output.
m<-bpc(data = tennis_agresti, #datafrane
player0 = 'player0', #name of the column for player 0
player1 = 'player1', #name of the column for player 1
result_column = 'y', #name of the column for the result of the match
model_type = 'bt', #bt = Simple Bradley Terry model
solve_ties = 'none' #there are no ties in the dataset so we can choose none here
)
#>
#> SAMPLING FOR MODEL 'bt' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 8.7e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.87 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 0.299618 seconds (Warm-up)
#> Chain 1: 0.312838 seconds (Sampling)
#> Chain 1: 0.612456 seconds (Total)
#> Chain 1:
#>
#> SAMPLING FOR MODEL 'bt' NOW (CHAIN 2).
#> Chain 2:
#> Chain 2: Gradient evaluation took 3.9e-05 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.39 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2:
#> Chain 2:
#> Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 2:
#> Chain 2: Elapsed Time: 0.284584 seconds (Warm-up)
#> Chain 2: 0.344625 seconds (Sampling)
#> Chain 2: 0.629209 seconds (Total)
#> Chain 2:
#>
#> SAMPLING FOR MODEL 'bt' NOW (CHAIN 3).
#> Chain 3:
#> Chain 3: Gradient evaluation took 4.1e-05 seconds
#> Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.41 seconds.
#> Chain 3: Adjust your expectations accordingly!
#> Chain 3:
#> Chain 3:
#> Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 3:
#> Chain 3: Elapsed Time: 0.305793 seconds (Warm-up)
#> Chain 3: 0.292243 seconds (Sampling)
#> Chain 3: 0.598036 seconds (Total)
#> Chain 3:
#>
#> SAMPLING FOR MODEL 'bt' NOW (CHAIN 4).
#> Chain 4:
#> Chain 4: Gradient evaluation took 4.2e-05 seconds
#> Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.42 seconds.
#> Chain 4: Adjust your expectations accordingly!
#> Chain 4:
#> Chain 4:
#> Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 4:
#> Chain 4: Elapsed Time: 0.30643 seconds (Warm-up)
#> Chain 4: 0.300302 seconds (Sampling)
#> Chain 4: 0.606732 seconds (Total)
#> Chain 4:
If rstan
is available and correctly working this function should
sample the posterior distribution and create a bpc
object.
To see a summary of the results we can run the summary function. Here we get three tables:
summary(m)
#> Estimated baseline parameters with HPD intervals:
#>
#>
#> Parameter Mean HPD_lower HPD_higher n_eff Rhat
#> -------------------- ------ ---------- ----------- ------- -----
#> lambda[Seles] 0.51 -2.37 3.34 663.50 1.01
#> lambda[Graf] 0.94 -1.85 3.71 622.34 1.01
#> lambda[Sabatini] -0.33 -3.12 2.50 662.40 1.01
#> lambda[Navratilova] 0.04 -2.87 2.91 637.48 1.01
#> lambda[Sanchez] -1.12 -4.04 1.64 670.86 1.01
#> NOTES:
#> * A higher lambda indicates a higher team ability
#>
#>
#> Posterior probabilities:
#> These probabilities are calculated from the predictive posterior distribution
#> for all player combinations
#>
#>
#> i j i_beats_j
#> ------------ ------------ ----------
#> Graf Navratilova 0.70
#> Graf Sabatini 0.77
#> Graf Sanchez 0.86
#> Graf Seles 0.60
#> Navratilova Sabatini 0.58
#> Navratilova Sanchez 0.74
#> Navratilova Seles 0.41
#> Sabatini Sanchez 0.66
#> Sabatini Seles 0.32
#> Sanchez Seles 0.20
#>
#>
#> Rank of the players' abilities:
#> The rank is based on the posterior rank distribution of the lambda parameter
#>
#>
#> Parameter MedianRank MeanRank StdRank
#> -------------------- ----------- --------- --------
#> lambda[Graf] 1 1.42 0.64
#> lambda[Seles] 2 2.08 0.88
#> lambda[Navratilova] 3 3.04 0.91
#> lambda[Sabatini] 4 3.65 0.83
#> lambda[Sanchez] 5 4.81 0.48
get_hpdi_parameters
functionget_rank_of_players
function.get_probabilities
function.expand_aggregated_data.
get_sample_posterior
function.predict
function.bt
) (Bradley and Terry 1952)davidson
) for handling ties (Davidson 1970)Options to add to the models:
-ordereffect
). E.g. for home advantage (Davidson and
Beaver 1977)-generalized
). When we have contestant
specific predictors (Springall 1973)-U
). For example, to compensate
clustering or repeated measures (Böckenholt 2001)E.g.:
bt
davidson-U
bt-generalized-ordereffect
Notes:
bt-U-ordereffect
is
equivalent to bt-ordereffect-U
-
is mandatoryexpand_aggregated_data
.This package provides a series of small and self contained vignettes that exemplify the use of each model. In the vignettes, we also provide examples of code for data transformation, tables and plots.
Below we list all our vignettes with a short description:
Getting Started: This vignette shows a basic example on tennis competition data, covering how to run a Bradley-Terry model, MCMC diagnostics, posterior predictive values, ranking, predict new matches
Ties and home advantage: This vignette covers a soccer example from the Brazilian soccer league. Here, we first model the results using a Bradley-Terry model and the Davidson model to handle ties. Then, we extend both models to include for order effects, this allows us to investigate the home advantage in and without the presence of ties.
Bradley-Terry with random effects: This vignette covers the problem of ranking black-box optimization algorithms based on benchmarks. Since in benchmarking we often run the same optimization algorithm more than once with the same benchmark problem, we need to compensate for the repeated measures effect. We deal with this utilizing a simple Bradley-Terry model with random effects.
If you are interested you are welcome to contribute to the repository through pull requests.
We have a short contributing guide vignette.
If you find bugs, please report it in https://github.com/davidissamattos/bpcs/issues
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.