Name sport
is an abbreviation for Sequential Pairwise Online Rating Techniques.
Package contains functions calculating ratings for two-player or multi-player
matchups. Methods included in package are able to estimate ratings
(players strengths) and their evolution in time, also able to predict output of
challenge. Algorithms are based on Bayesian Approximation Method, and they don't
involve any matrix inversions nor likelihood estimation. sport
incorporates
glicko algorithm, glicko2, bayesian Bradley-Terry and dynamic logistic
regression. Parameters are updated sequentially, and computation doesn't require
any additional RAM to make estimation feasible. Additionally, package is written
in c++
what makes computations even faster.
Before start, it's recommended to read theoretical foundations of algorithms in
other sport
vignette "The theory of the online update algorithms".
Package can be installed from CRAN or from github.
install.packages("sport") devtools::install_github("gogonzo/sport")
Package contains actual data from Speedway Grand-Prix. There are two data.frames:
gpheats
- results SGP heats. Column rank
is a numeric version of column
position
- rider position in race.
gpsquads
- summarized results of the events, with sum of point and final
position.
library(sport) data <- gpheats[1:1002, ] str(data)
Data used in sport
package must be in so called long format. Typically
data.frame
contains at least id
, name
of the player and rank
,
with one row for one player within specific match. Package allows for any number
of players within event and allows ties also.
In all methods, output variable needs to be expressed as a rank/position in event.
Don't mix up rank output with typical 1-win, 0-lost. In sport
package output
for two player game should be coded as 1=winner 2=looser. Below example of two
matches with 4 players each.
data[1:8, c("id","rider","rank")]
To compute ratings using each algorithms one has to specify formula.
- RHS of the formula have to be specified with player(player)
term or
player(player | team)
when players competes in team match. player(...)
is a
term function which helps identify column with player
names and/or team
names.
- LHS of the formula should contain rank
term which points to column where
results (ranks) are stored and id
(optional). RHS should rather be specified
by rank | id
to split matches - if id
is missing all data will be computed
under same event id.
glicko <- glicko_run(formula = rank | id ~ player(rider), data = data) glicko2 <- glicko2_run(formula = rank | id ~ player(rider), data = data) bbt <- bbt_run(formula = rank | id ~ player(rider), data = data) dbl <- dbl_run(formula = rank | id ~ player(rider), data = data) print(glicko)
Objects returned by <method>_run
are of class rating
and have their own
print
and summary
which provides simple overview. print.sport
shows
condensed informations about model performance like accuracy and consistency of
model predictions with observed probabilities. More precise overview are
given by summary
by showing ratings, ratings deviations and comparing model
win probabilities with observed.
summary(dbl)
To visualize top n ratings with their 95% confidence interval one can use
dedicated plot.rating
function. For dbl
method top coefficients are
presented which doesn't have to be player specific (ratings). It's also possible
to examine ratings evolution in time, by specifying players
argument.
plot(glicko, n = 15) plot(glicko, players = c("Greg HANCOCK","Tomasz GOLLOB","Tony RICKARDSSON"))
Except dedicated print
,summary
and plot
there is possibility to extract
more detailed information for analyses. rating
object contains following
elements:
names(glicko)
rating$final_r
and rating$final_rd
contains the last estimate of the
ratings and ratings deviations. For glicko2
there is also rating$final_sigma
.
r
contains data.table
with prior ratings estimations from first event
to the last. Number of rows in r
equals number of rows in input data.
pairs
pairwise combinations of players in analyzed events with prior
probability and result of a challenge.
tail(glicko$r) tail(glicko$pairs)
Examples presented in package overview might be sufficient in most cases, but sometimes it is necessary to adjust algorithms to fit data better. One characteristic of the online update algorithms is that variance of the parameters drops quickly to zero. Especially, when the number of events for the player is big ($\small n_i>100 $), after hundreds iterations rating parameters are very difficult to change, and output probabilities use to be extreme. To avoid these mistakes some additional controls should be applied, which is explained in this section with easy to learn examples.
In all methods formula must contain rank | id ~ player(player)
elements, to
correctly specify the model.
rank
denotes column with output (order).
id
denotes event id, within which update is computed.
player(...)
function helps to identify column in which names of the players
are stored. player(...)
can be specified in two ways:
player(player)
if results of the event are observed per player.
r
glicko2 <- glicko2_run(
data = data.frame(
id = c(1, 1, 1, 1),
player = c("a", "b", "c", "d"),
rank_player = c(3, 4, 1, 2)
),
formula = rank_player | id ~ player(player)
)
player(player | team)
when players competes within teams, and results are
observed per team. This option is not available in dbl_run
which requires only
formula for player matchups.
r
glicko2 <- glicko2_run(
data = data.frame(
id = c(1, 1, 1, 1),
team = c("A", "A", "B", "B"),
player = c("a", "b", "c", "d"),
rank_team = c(1, 1, 2, 2)
),
formula = rank_team | id ~ player(player | team)
)
other variables - available only in dbl_run
, which allows to specify other
factors in model.
```r dbl <- dbl_run( data = data.frame( id = c(1, 1, 1, 1), name = c("A", "B", "C", "D"), rank = c(3, 4, 1, 2), gate = c(1, 2, 3, 4), factor1 = c("a", "a", "b", "b"), factor2 = c("a", "b", "a", "b") ), formula = rank | id ~ player(name) + gate * factor1)
```
r
and rd
Main functionality which is common between all algorithms is to specify prior r
and rd
. Both parameters can be set by creating named vectors. Let's suppose we
have 4 players c("A","B","C","D")
competing in an event, and we have players
prior r
and rd
estimates. It's important to have r
and rd
names
corresponding with levels of name
variable. One can run algorithm, to obtain
new estimates.
model <- glicko_run(data = gpheats[1:16, ], rank | id ~ player(rider))
We can also run models re-using previously estimated parameters from model$final_r
and model$final_rd
in the future when new data appear.
glicko_run( formula = rank | id ~ player(rider), data = gpheats[17:20, ], r = model$final_r, rd = model$final_rd )$final_r
weight
All algorithms have a weight argument which increases or decreases update size.
Higher weight increasing impact of corresponding event. Effect of the weight on
update size can be expressed directly by following formula -
$\small R_i^{'} \leftarrow R_i \pm \omega_i * \Omega_i$.
To specify weight $\omega_i$ one needs to create additional column in input
data, and pass the name of the column to weight
argument. For example weight
could depend on importance of competition. In speedway Grand-Prix last three
heats determine event winner, thus they weight more.
library(dplyr) data <- mutate(data, weight = ifelse(heat >= (max(heat) - 3), 2, 1)) glicko <- glicko_run(formula = rank | id ~ player(rider), data = data, weight = "weight")
kappa
In situation when player plays games very frequently, rd
can quickly decrease
to zero, making further changes limited. Setting kappa
(single value) avoids
rating deviation decrease to be lower than specified fraction of rd
. In other
words final rd
can't be lower than initial RD
times kappa
$$\small RD' \geq RD * kappa$$
bbt1 <- bbt_run(formula = rank | id ~ player(rider), data = data, kappa = 0.99) # RD decreases at most 1% bbt2 <- bbt_run(formula = rank | id ~ player(rider), data = data, kappa = 0.8) # RD decreases at most 20% all(bbt1$final_rd >= bbt2$final_rd)
lambda
In some cases player ratings tend to be more uncertain. If scientist have prior
knowledge about higher risk of event or uncertainty of specific player
performance, then one might create another column with relevant values and pass
the column name to lambda
argument.
# bbt example data <- data %>% group_by(rider) %>% mutate(idle_30d = if_else(as.integer(date - lag(date)) > 30, 1.0, 2.0)) %>% filter(!is.na(idle_30d)) bbt <- bbt_run(rank | id ~ player(rider), data = data, lambda = "idle_30d")
In above examples players competes as individuals, and each is ranked at the
finish line. There are sports where players, competes in teams, and results
are reported per team. sport
is able to compute player ratings, and requires
only changing formula from player(player)
to player(player | team)
. data.frame
should always be a long format, with one player for each row. Ratings are updated
according to their contribution in team efforts. share
argument can be added
optionally if scientist have some knowledge about players contribution in match
(eg. minutes spent on the field from all possible minutes).
glicko2 <- glicko2_run( data = data.frame( id = c(1, 1, 1, 1), team = c("A", "A", "B", "B"), player = c("a", "b", "c", "d"), rank_team = c(1, 1, 2, 2), share = c(0.4, 0.6, 0.5, 0.5) ), formula = rank_team | id ~ player(player | team), share = "share" ) glicko2$final_r
Output object contains the same elements as normal, with one difference -
pairs
contains probability and output per team, and r
contains prior ratings
per individuals.
glicko2$pairs glicko2$r
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.