README.md

epldata

Datasets of the English Premier League 1992-2018

This package is a repository of datasets relating to Football (soccer) English Football League from its inception in August 1992 through to the end of the 2017/18 season. The intention is to update it annually shortly after each season ends in May

None of the data is official and there are sure to be a few, hopefully trivial, errors. Some data e.g. transfer fees are estimates

There are nine data sets loosely structured around the idea of a relational SQL database. So no duplicated data and lots of joins required to make full use of the figures. The data has been compiled over more than 25 years so has some bad practices built-in bur these should not detract from usage unduly

  1. assists - ids of one or two players assisting each goal
  2. game - game id, date, attendance, referee
  3. game_team - team name,venue for each game id
  4. goals - player_game id, details of method, place and play
  5. manager_team - joining and leaving dates of managers at each team
  6. managers - manager name and id
  7. player_game - player_team and team_game ids, whether starter, time on and off, cautions, own-goals and missed penalties
  8. player_team - date of joining and leaving team, transfer fees involved, whether on-loan
  9. players - first and last name, place and date of birth, field position

Installation

Currently the package is not on CRAN

# Install from GitHub
devtools::install_github("pssguy/epldata")

# View datasets and functions
 help(package="epldata")

# Load dataset
 library(epldata)
 data(players)

A lot of joins between tables are necessary and you may find it useful to create derived data.frames if you plan to use the data extensively. Examples are covered in the Vignette

I have included a couple of example functions within the package

Suggested packages

In order to make full use of the data you may want to consider the following packages which epldata does not depend on

There are many others - too many to mention - which I have used on a less frequent basis

Usage

Although, the data is a basic information, the availability of so many rich packages and the quantity of data mean that a wide range of output in terms of both form and content is possible and really depends on the imagination of the developer. I have included some examples in a vignette but there are far more output examples, with code, based on derived tables on the mytinyshinys blog

It can be used as a fun way to introduce students to coding in R and producing visualizations using data related to probably the most popular world wide Sports League

Other uses might include

Here are some real-world examples of the output

Personal

Others

Please let me know of any interesting usage of the package and I will list them here

Comparable data

I am not aware of any comparable non-commercial data. I was collecting certain aspects of the data including assists and goal descriptions well before any official adaptation

The engsoccerdata package authored and maintained by James Curley makes a good complement. It has a far broader scope both temporally and geographically as it provides league match results for many of the English divisions back into the 19th Century as well as the leading leagues of many other nations. it also includes Cup data. However, it does not have the depth of this package with no player or goal information

Examples of open-source datasets in other sports fields include the lahman baseball database and the deuce tennis package

Future development

Acknowledgements

I would like to thank my brother, Stuart Clark, for providing all the goal and assist data for many years Also the soccerbase web site has been a great reference source

In addition, the developers and maintainers of all the R packages I have used; pride of place going to the RStudio team and Carson Sievert for the, ropensci, plotly package



pssguy/epldata documentation built on May 12, 2019, 7:36 a.m.