knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )

library(bpcs) library(ggplot2) library(dplyr) library(tibble) library(kableExtra) library(bayesplot) library(knitr)

In this vignette, we provide an example of the usage of the `bpcs`

package along with the core concepts to use the package.

The package requires installation of the `rstan`

package [@rstan]. For more details see the `REAMDE.md`

document.

To install the latest version from Github:

remotes::install_github('davidissamattos/bpcs')

After install we load the package with:

```
library(bpcs)
```

The `bpcs`

package performs Bayesian estimation of Paired Comparison models utilizing Stan.
We provide a series of models and auxiliary functions to help in the analysis and evaluation of the models. However, this package have the philosophy of 'batteries not included' for plots, tables and data transformation. There are already many great packages capable of performing create high quality plots, tables and that provides tools for data transformation. Since each user can have their own preferences, customization needs and data cleaning and transformation workflows, we designed not to enforce any particular framework or package. Our functions were designed to return cleaned data frames that can be used almost directly, or with few transformations in those packages.

At the moment, the only exception to this is the `expand_aggregated_data`

function that receives a data frame with the number of wins for player 1 and the numbers of wins for player 2 and expand this aggregated data into a single match per row (that is required for our models). We include this function because this type of transformation is common since packages such as `BradleyTerry2`

[@turner2012bradley] receives this type of aggregated data and many available datasets are presented like that.

With that said, we provide in the vignettes the code we use to transform the data and generate the tables and plots. The user is free to use/copy/modify these codes for their own use. For those we rely on the collection of packages `tidyverse`

[@tidyverse2019], and the packages `knitr`

[@knitr2014] and `kableExtra`

[@kableExtra2020].

In this example, we will use the example from tennis players from Agresti [@agresti2003categorical]. The data `tennis_agresti`

contains the information regarding tennis matches between 5 players, and who won the match, 0 for player0 or 1 for player1.

knitr::kable(tennis_agresti) %>% kableExtra::kable_styling()

We can fit a Bayesian Bradley-Terry model using the `bpc`

function

m1 <- bpc(data = tennis_agresti, player0 = 'player0', player1 = 'player1', result_column = 'y', model_type = 'bt', solve_ties = 'none', #there are no ties show_chain_messages = T)

After the chain converges to find the result we can investigate if everything went right.
For that we can use the excellent tool provided in the `shinystan`

[@shinystan2018] package that helps to assess the convergence of the chains.

The `bpcs`

package provides a tiny wrapper to launch it automatically with some default parameters.

```
launch_shinystan(m1)
```

Alternatively, you can retrieve the stanfit object and launch it with your own parameters.

stanfit <- get_stanfit(m1) shinystan::launch_shinystan(stanfit)

If you prefer to investigate without `shinystan`

we can retrieve the stanfit object and investigate ourselves or with the help of the `bayesplot`

package [@bayesplot2019]. Here we need the stanfit and the stan posterior matrix to proceed. The indexes in Stan refer to the names and indexes available at the lookup table.

knitr::kable(m1$lookup_table)

stanfit <- get_stanfit(m1) posterior<-rstan::extract(stanfit,inc_warmup=T,permuted=F)

Getting the traceplots:

bayesplot::mcmc_trace(posterior,pars = c("lambda[1]","lambda[2]","lambda[3]","lambda[4]"), n_warmup=1000)

Verifying the Rhat and neff using the functions from `rstan`

rstan::summary(stanfit ,pars=c('lambda'))$summary

We first get the observed values and then the predictive values of the original dataframe. We can get predictive values with the predictive function and passing a data frame with the values we want to predict (in this case the original one). Note that we need to have the same column names in this new data frame

y<-as.vector(tennis_agresti$y) yrep<-predict(m1,tennis_agresti,n=100,return_matrix = T) yrep<-yrep[,1:46] #from column 47 we have if it was a tie or not. We just need to remove this

bayesplot::ppc_bars(y=y, yrep=yrep) + labs(title = 'Bar plot with medians and uncertainty\n intervals superimposed')

The plots indicate a good model as the predictive posterior and the observed values agree largely.

Now that we are confident that our model is correct, we can create some tables to report our results.

To see the results in the console the `summary`

function provides a good overview of the model. With parameters, probability of winning and a ranking.

```
summary(m1)
```

If we want to create nicer tables and export them to latex/html we can leverage this with the `kable`

function and the `kableExtra`

package. Note that for extensive customization (and examples) we refer to the packages documentation.

Parameter table with HPD intervals

knitr::kable(get_hpdi_parameters(m1), caption = 'Parameter distribution and the High Posterior Density intervals', digits = 2) %>% kable_styling()

Plot the HPD intervals of the strength

hpdi <- get_hpdi_parameters(m1) %>% dplyr::filter(startsWith(Parameter, "lambda")) ggplot2::ggplot(hpdi, aes(x = Parameter)) + ggplot2::geom_pointrange(aes(ymin = HPD_lower, ymax = HPD_higher, y = Mean)) + ggplot2::labs(y = "Estimate", x = "Player", title = "HPDI interval of the strength of the players") + ggplot2::coord_flip()

prob_table<-get_probabilities(m1)$Table knitr::kable(prob_table, caption = 'Probabilities of one player beating the other', digits = 2) %>% kableExtra::kable_styling()

We might also be interested in ranking the players based on their ability $lambda$. In the Bayesian case, we sample the posterior distribution of $lambda$ and rank them so we have posterior distribution of the ranks. This can be achieve with the function `get_rank_of_players`

.

ranking <- get_rank_of_players(m1)

We can produce a table with the values of this dataframe.

t <- ranking %>% dplyr::select(Parameter, MedianRank, StdRank) knitr::kable(t, caption = 'Rank of the players') %>% kable_styling()

If we want to visualize the histogram of the rank distribution of each player.

ggplot2::ggplot()+ ggplot2::geom_histogram(aes(x=ranking$PosteriorRank[1]$rank),bins = 5)+ ggplot2::labs(title = 'Posterior distribution of the rank for Graf', x='Rank')

To predict new results we need a data frame similar to the one used to fit the data. We use the same function as in the predicted posterior but now we provide the data we want to predict instead of the original data. Lets predict who is the winner for all games from Seles. Now we don't want to return the matrix but a data frame

tennis_new_games<- tibble::tribble(~player0, ~player1, 'Seles', 'Graf', 'Seles', 'Sabatini', 'Seles', 'Navratilova', 'Seles', 'Sanchez') y_seles<-predict(m1,tennis_new_games,n=100) #Now let's summarize the posterior y_seles <- dplyr::mutate(y_seles, avg_win_player1 = rowMeans(select(y_seles, starts_with('y_pred')))) y_seles %>% dplyr::select(player0, player1,avg_win_player1) %>% knitr::kable()

If the average number of wins of player 1 is higher than 0.5 than player 1 wins more times than player 0.

Note that this is consistent with the obtained ranking and the probabilities of beating.

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.