devtools::load_all(".") knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
This package is used to analyze Serie A soccer (Calcio) data. It creates an accessible R data-frame with information about match results, as well as team stats, Elo ratings, and overall standings. This data-frame is used to generate visualizations on a Shiny App: https://datavisr.shinyapps.io/calcior/
The data is sourced from https://github.com/openfootball which contains the results of all Serie A match since the 2013/14 season. The data is extracted using Ruby with the sportdb gem. Running this will create a local SQLite database sport.db
that we can use to read into R.
source_data <- dao$new() source_data$get_data() filter_na_cols <- function(df) df[,purrr::map_lgl(df, ~!all(is.na(.x)))] filter_at_cols <- function(df) df %>% select(-one_of("created_at", "updated_at")) source_data$tables %>% purrr::map(filter_na_cols) %>% purrr::map(filter_at_cols) %>% glimpse()
The source data is transformed from a set of relational tables to a single data-frame serie_a
which contains list columns of data-frame to maintain the relationship of teams and matches to match_days (rounds) and season. Summary data and Elo ratings are also calculated (details below).
serie_a
season
:Serie A seasons starting from 2013/14 to 2016/17
match_days_complete
:The number of matches completed so far for each season.
teams
:The teams included for each season in Serie A. They change each season as the bottom 3 teams are sent down to Serie B and the top 3 teams from Serie B are promoted.
serie_a %>% select(season, teams) %>% tidyr::unnest(teams) %>% glimpse()
results
:For every season
, match_day
and team (p_team
, for primary team) it shows their score (p_score
), their opponents score (o_score
), if they were home (p_home
) and how many points
the p_team
earned from the result.
serie_a %>% select(season, results) %>% tidyr::unnest(results) %>% tidyr::unnest(data) %>% glimpse()
ratings
:For every season
, match_day
and team (p_team
) it shows the teams Elo rating r
.
The Elo calculations are mostly based on this site: http://www.eloratings.net/system.html. With k
= 20 and a season reverting factor of 0.25.
serie_a %>% select(season, ratings) %>% tidyr::unnest(ratings) %>% tidyr::unnest(data) %>% glimpse()
standings
:For every season
,match_day
and team (p_team
) it shows the teams cumulative points
, goals_for
, goals_against
and goal_diff
, along with their position
in comparison to the other teams.
serie_a %>% select(season, standings) %>% tidyr::unnest(standings) %>% tidyr::unnest(data) %>% glimpse()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.