The purpose of this package is primarily to generate historical data from Premier League football matches suitable for fitting goal prediction models. It is my first package and the name ‘randy’ was the working title of the project that this originated from.
I was inspired to build this to help with a long running game I have played with friends.
An example of our completed points league from the 2020/21 season can be found here
Previously I predicted scores using largely gut instinct which had returned modest results over the years. I would typically get 45-50% of results correct with about 40 / 380 correct per season. Thinking that I may do better with a more methodical approach I started researching methods developed by countless others to achieve the same aim. I will not document all of that here but see below a couple of excellent packages that really helped my learning in this area:
Whether I used the approaches documented above or worked on building my own Poisson models, the first step was to pull together the historical premier league data sets to fit models to.
You can install the development version of randy from GitHub with:
# install.packages("devtools")
devtools::install_github("dafyddhowells/randy")
library(randy)
This function pulls in the results from all fixtures in the entire history of the Premier League including the current season:
head(get_prem_history())
| date | home_team | away_team | fthg | ftag | fixture | |------------|-------------|----------------|------|------|----------------------------| | 1993-08-14 | Arsenal | Coventry | 0 | 3 | Arsenal v Coventry | | 1993-08-14 | Aston Villa | QPR | 4 | 1 | Aston Villa v QPR | | 1993-08-14 | Chelsea | Blackburn | 1 | 2 | Chelsea v Blackburn | | 1993-08-14 | Liverpool | Sheffield Weds | 2 | 0 | Liverpool v Sheffield Weds | | 1993-08-14 | Man City | Leeds | 1 | 1 | Man City v Leeds | | 1993-08-14 | Newcastle | Spurs | 0 | 1 | Newcastle v Spurs |
To create a data set for fitting models to, each team in a fixture needs to appear as it’s own observation whether it is home or away to allow us to predict the goal outcome for that team in a fixture. We also need to calculate the following variables to show a team’s attacking and defending strength either home or away:
prem_history <- get_prem_history()
prem_form_df <- get_prem_form(prem_history)
head(prem_form_df)
| team | fixture | date | fthg | ftag | form | scored | conceded | home | win | draw | |---------|-----------------------|------------|------|------|------|--------|----------|------|-----|------| | Arsenal | Arsenal v Coventry | 1993-08-14 | 0 | 3 | 0 | 0 | 0 | TRUE | 0 | 0 | | Arsenal | Arsenal v Leeds | 1993-08-24 | 2 | 1 | 0 | 0 | 0 | TRUE | 1 | 0 | | Arsenal | Arsenal v Everton | 1993-08-28 | 2 | 0 | 0 | 0 | 0 | TRUE | 1 | 0 | | Arsenal | Arsenal v Ipswich | 1993-09-11 | 4 | 0 | 2.25 | 2 | 1 | TRUE | 1 | 0 | | Arsenal | Arsenal v Southampton | 1993-09-25 | 1 | 0 | 3 | 2.25 | 0.25 | TRUE | 1 | 0 | | Arsenal | Arsenal v Man City | 1993-10-16 | 0 | 0 | 2.5 | 1.75 | 0 | TRUE | 0 | 1 |
We may also want to consider the average goals scored and conceded, home and away in the same instance of a fixture over a number of years. For e.g. how many goals on average do Everton score at home vs Arsenal, how many goals on average do West Ham concede away vs Man Utd.
prem_history <- get_prem_history()
prem_fixture_history_df <- get_prem_fixture_history(prem_history)
head(prem_fixture_history_df)
| team | fixture | date | fthg | ftag | sf_scored | sf_conceded | home | |---------|-----------------------|------------|------|------|-----------|-------------|------| | Arsenal | Arsenal v Aston Villa | 1993-11-06 | 1 | 2 | 0 | 0 | TRUE | | Arsenal | Arsenal v Aston Villa | 1994-12-26 | 0 | 0 | 0 | 0 | TRUE | | Arsenal | Arsenal v Aston Villa | 1995-10-21 | 2 | 0 | 0 | 0 | TRUE | | Arsenal | Arsenal v Aston Villa | 1996-12-28 | 2 | 2 | 1.25 | 1 | TRUE | | Arsenal | Arsenal v Aston Villa | 1997-10-26 | 0 | 0 | 1 | 0.5 | TRUE | | Arsenal | Arsenal v Aston Villa | 1999-05-16 | 1 | 0 | 1.25 | 0.5 | TRUE |
Sourced from https://fbref.com, similar to calculating running average
goal form provided by get_prem_form()
, we can calculate the same for a
team, either home or away for their expected goals (xG) to provide
additional variables for a team’s attacking and defending strength.
xg_fixture_history_df <- get_xg_fixture_history()
head(xg_fixture_history_df)
| date | team | fixture | actual_score | xg1 | xg2 | home | xg_for | xg_against | |------------|---------|-----------------------|--------------|-----|-----|------|--------|------------| | 2017-08-11 | Arsenal | Arsenal v Leicester | 4–3 | 2.3 | 1.3 | TRUE | 0 | 0 | | 2017-09-09 | Arsenal | Arsenal v Bournemouth | 3–0 | 2 | 0.9 | TRUE | 0 | 0 | | 2017-09-25 | Arsenal | Arsenal v West Brom | 2–0 | 2.4 | 0.7 | TRUE | 0 | 0 | | 2017-10-01 | Arsenal | Arsenal v Brighton | 2–0 | 3 | 0.4 | TRUE | 2.425 | 0.825 | | 2017-10-28 | Arsenal | Arsenal v Swansea | 2–1 | 1.6 | 0.6 | TRUE | 2.25 | 0.65 | | 2017-11-18 | Arsenal | Arsenal v Spurs | 2–0 | 1.4 | 1.4 | TRUE | 2.1 | 0.775 |
We can combine all 3 of these sets of variables into 1 data frame using
get_model_data()
with some additional variables including:
model_data <- get_model_data()
head(model_data)
| team | fixture | date | fthg | ftag | form | scored | conceded | home | win | draw | sf_scored | sf_conceded | season | points | cum_points | game | league_pos | actual_score | xg1 | xg2 | xg_for | xg_against | |----------------|---------------------------|------------|------|------|------|--------|----------|-------|-----|------|-----------|-------------|-----------|--------|------------|------|------------|--------------|-----|-----|--------|------------| | Man Utd | Man Utd v Wolves | 2022-01-03 | 0 | 1 | 2.25 | 1.75 | 1 | TRUE | 0 | 0 | 1.25 | 0.75 | 2021/2022 | 0 | 31 | 19 | 4 | 0–1 | 0.8 | 0.7 | 1.45 | 0.975 | | Wolves | Man Utd v Wolves | 2022-01-03 | 0 | 1 | 1.25 | 0.25 | 0.25 | FALSE | 1 | 0 | 0.75 | 0.75 | 2021/2022 | 3 | 28 | 19 | 5 | 0–1 | 0.8 | 0.7 | 0.625 | 1.725 | | Aston Villa | Brentford v Aston Villa | 2022-01-02 | 2 | 1 | 2.25 | 2 | 1.25 | FALSE | 0 | 0 | 1.5 | 1.5 | 2021/2022 | 0 | 22 | 19 | 8 | 2–1 | 0.6 | 1.2 | 0.925 | 1.125 | | Brentford | Brentford v Aston Villa | 2022-01-02 | 2 | 1 | 2.25 | 1.25 | 0.75 | TRUE | 1 | 0 | 2 | 0.75 | 2021/2022 | 3 | 23 | 19 | 7 | 2–1 | 0.6 | 1.2 | 1 | 0.875 | | Brighton | Everton v Brighton | 2022-01-02 | 2 | 3 | 1 | 0.25 | 0.75 | FALSE | 1 | 0 | 1 | 1 | 2021/2022 | 3 | 27 | 19 | 6 | 2–3 | 1.7 | 1.5 | 1.225 | 1.45 | | Burnley | Leeds v Burnley | 2022-01-02 | 3 | 1 | 1.75 | 1.25 | 1 | FALSE | 0 | 0 | 1 | 1 | 2021/2022 | 0 | 11 | 17 | 15 | 3–1 | 1.6 | 1 | 0.675 | 1.4 | | Chelsea | Chelsea v Liverpool | 2022-01-02 | 2 | 2 | 1.5 | 1.75 | 1.5 | TRUE | 0 | 1 | 1.25 | 1.25 | 2021/2022 | 1 | 43 | 21 | 2 | 2–2 | 1.3 | 1.3 | 1.975 | 1 | | Everton | Everton v Brighton | 2022-01-02 | 2 | 3 | 1 | 1.25 | 2 | TRUE | 0 | 0 | 2 | 1 | 2021/2022 | 0 | 19 | 18 | 12 | 2–3 | 1.7 | 1.5 | 0.95 | 1.4 | | Leeds | Leeds v Burnley | 2022-01-02 | 3 | 1 | 1.75 | 1.75 | 1.75 | TRUE | 1 | 0 | 3.25 | 1.25 | 2021/2022 | 3 | 19 | 19 | 10 | 3–1 | 1.6 | 1 | 1.525 | 1.55 | | Liverpool | Chelsea v Liverpool | 2022-01-02 | 2 | 2 | 1 | 1 | 1.25 | FALSE | 0 | 1 | 2 | 2 | 2021/2022 | 1 | 42 | 20 | 2 | 2–2 | 1.3 | 1.3 | 1.95 | 1.375 | | Arsenal | Arsenal v Man City | 2022-01-01 | 1 | 2 | 2.25 | 2 | 0.5 | TRUE | 0 | 0 | 0.25 | 2.5 | 2021/2022 | 0 | 35 | 20 | 3 | 1–2 | 1 | 1.8 | 1.8 | 0.725 | | Crystal Palace | Crystal Palace v West Ham | 2022-01-01 | 2 | 3 | 1.75 | 2.5 | 1.5 | TRUE | 0 | 0 | 1.75 | 1.75 | 2021/2022 | 0 | 23 | 20 | 5 | 2–3 | 2.2 | 2 | 1.725 | 1.3 | | Man City | Arsenal v Man City | 2022-01-01 | 1 | 2 | 0.25 | 1 | 3.25 | FALSE | 1 | 0 | 3.25 | 3.25 | 2021/2022 | 3 | 53 | 21 | 1 | 1–2 | 1 | 1.8 | 2.325 | 0.6 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.