In dclaz/mELO: Multidimensional Elo Ratings

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

mELO

An R implementation of DeepMind's multidimensional Elo rating (mELO) approach for evaluating agents. The mELO rating system has the desirable properties of being able to handle cyclic, non-transitive interactions and is better behaved in the presence of redundant copies of agents or tasks.

Balduzzi, et al. (2018) proposed that a rating/evaluation method should have the following properties:

P1. Invariant: adding redundant copies of an agent or task to the data should make no difference.
P2. Continuous: the evaluation method should be robust to small changes in the data.
P3. Interpretable: hard to formalize, but the procedure should agree with intuition in basic cases.

Typical methods for performing pairwise comparisons such as Elo or Glicko violate P1 and can result in poor evaluations of agents and poor predictions of outcomes.

This package uses code directly from Alec Stephenson and Jeff Sonas's excellent PlayerRatings package (v1.0-3, 2019-02-22).

Installation

You can install the mELO package from github using:

devtools::install_github("dclaz/mELO")

and it can be loaded using:

library(mELO)

Vignettes

The following vignettes describe the functionality and utility of the package in more detail:

Introduction. An introduction to the mELO package with examples for evaluating agents with cyclic and other more complex non-transitive interaction properties.
Application to AFL matches. An investigation of whether there is any evidence of non-transitive relationships in AFL match outcomes.
Impact of noisy or probabilistic outcomes in mELO models. An investigation into how noise or probabilistic outcomes impacts the estimation abilities and/or the prediction accuracy of a mELO model.

Example

The example below demonstrates how a ratings model can be fit using the ELO() and mELO() functions.

We will fit models to predict the outcome of rock-paper-scissor matches. It contains r nrow(rps_df) matches.

# Inspect rock paper scissors data
head(rps_df)

# Fit ELO model
rps_ELO <- ELO(rps_df)

# Inspect modelled ratings
rps_ELO

# Get predictions
ELO_preds <- predict(
    rps_ELO,
    head(rps_df)
)

# Inspect outcomes and predictions
cbind(
    head(rps_df),
    ELO_preds = round(ELO_preds, 3)
)

We note that while the estimated ratings are roughly equal, the predicted probabilities are very poor. Elo cannot handle the cyclic, non-transitive nature of this system.

Let's fit a multidimensional Elo model using mELO():

# Fit mELO model
rps_mELO <- mELO(rps_df, k=1)

# Inspect modelled ratings
rps_mELO

# Get predictions
mELO_preds <- predict(
    rps_mELO,
    head(rps_df)
)

# Inspect outcomes and predictions
cbind(
    head(rps_df),
    mELO_preds = round(mELO_preds, 3)
)

A convenient helper function has been provided to generate predictions for all interactions:

model_pred_mat(
  rps_mELO,
  rps_df[[2]]
) %>% 
  round(3) %>%
  knitr::kable()

The mELO model with k=1 can handle the cyclic, non-transitive nature of this system which results in much more accurate predictions. The k parameter determines the complexity of the non-transitive interactions that can be modelled.

dclaz/mELO documentation built on May 17, 2021, 2:27 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com