knitr::opts_chunk$set(echo = TRUE) library(sihfapi) library(httr) library(tidyverse) library(lubridate) library(stringr)
The website of the Swiss Ice Hocke Federation (SIHF) exposes an API for use by its "Gamecenter". The present package makes use of this API to download results and game details.
There does not seem to be any public documentation on the API and how to use it. Thus, the package has been developed through backwards engineering. This fact is still much noticable at some points and contributions are highly welcome.
"Results" are short tables that summarise game outcomes. They are particularly important as we can retrieve long lists of results using handy filters such season, league and date.
results <- fetch_results(c(Season = 2017)) %>% print()
Filters are not fully understood. The 'Date' filter has a somewhat particular behvior (see ?fetch_results()
).
fetch_results(c(Season = 2017, Date = "20.9.2016")) fetch_results(c(Season = 2017, Date = "1.10.2016-31.10.2016"))
Internally, the results object is a list of c("content", "path", "query", "response")
.
The content is provided by SIHF. It is a list object with multiple nested lists. At the top level, the list contains 14 elements:
str(results$content, max.level = 1)
Most of these elements are unintersting as they only contain rendering instructions for the SIHF website. By far the most interesting element $data
which contains the actual data content of the response. $header
provides meta data for each item of the data. This meta data is the name, description, and alias of each item as well as some item-specific rendering instructions.
header <- tibble( name = map_chr(results$content$header, 'name'), description = map_chr(results$content$header, 'description'), alias = map_chr(results$content$header, 'alias') ) %>% print()
$data
is a list of variable length, where each element of the list represents an individual game. Each of these games consists of 11 elements that correspond to the header items discussed above:
d <- results$content$data %>% map(~set_names(.x, header$alias)) d[[1]]
d <- d %>% map(~enframe(.x) %>% spread(name, value) ) %>% bind_rows() %>% mutate_at(vars(date, day, decision, status, time), unlist) %>% # Parse date_time mutate(date_time = dmy_hm(paste(date, time))) %>% select(-date, -day, -time) %>% print()
This simple parsing contains nested lists and may be useful for further processing, as it contains a maximum amount of data with minimum structure. Further parsing may be more useful for human readers:
# Collapse complex multi-team, multi-period scores into " | " separated string collapseScore <- function(score) { map2( score['homeTeam'], score['awayTeam'], ~paste(.x, .y, sep = ":") ) %>% unlist(use.names = FALSE) %>% paste(collapse = " | ") } results.games <- d %>% mutate( gameId = map_chr(details, 'gameId'), awayTeamAcronym = map_chr(awayTeam, 'acronym'), homeTeamAcronym = map_chr(homeTeam, 'acronym'), finalScore = map_chr(score, collapseScore), periodScore = map_chr(scorePerPeriod, collapseScore) ) %>% select( date_time, gameId, ends_with('TeamAcronym'), ends_with('Score', ignore.case = FALSE), decision ) %>% print()
Note that the list of games we retrieved is neither resticted to NLA games nor a full list of all games available in the API. Rather, it appears to be a combination of NLA, CHL and national team games.
We can also use the results table to compile a simple teams table that lists teamId, acronym and name.
results.teams <- d %>% select(ends_with('Team')) %>% gather("key", "value") %>% select(value) %>% mutate(value = map( value, ~enframe(.x) %>% spread(name, value) )) %>% unnest() %>% mutate_all(unlist) %>% select(id, acronym, name) %>% distinct() %>% arrange(id) %>% print()
One of the important use case of the results table is to retrieve gameIds which we require to retrieve game details.
Once we know the ID of a game we are interested in, we can easily fetch the detail of this game:
gameDetail <- results.games %>% slice(30) %$% gameId %>% fetch_gameDetail() %>% print()
Internally, the results object is a list of c("content", "path", "query", "response")
. These items are largely analogous to the ones in 'results'. Obviously, content is different.
Using functions of the apply
family or from the purrr
package, fechting details on multiple games is simple.
gameDetail <- results.games %>% slice(30:32) %$% gameId %>% map(fetch_gameDetail) %>% map(print)
The content of each game detail is a fairly large and complex list of lists:
str(gameDetail[[1]]$content, max.level = 1)
Upon closer inspection, we find a number of items in this list that are relatively easy to transform into a table where each row represents a game and is identified by gameId
. We divided these items into three classes that need separate parsing:
nm.game <- list( single = c("gameId", "season", "updateDateTime", "startDateTime", "endDateTime", "videoId", "info", "alias"), idName = c("league", "region", "group", "tournament", "qualification", "phase"), status = "status" ) %>% print()
single
is a group of items that consists only of a single value:
gameDetail.game <- list() gameDetail.game$single <- gameDetail %>% map('content') %>% map(`[`, nm.game$single) %>% map(enframe) %>% map(~spread(., name, value)) %>% bind_rows() %>% mutate_all(unlist) %>% mutate_at( vars(ends_with("DateTime")), ymd_hms ) %>% print()
idName
items consist of an integer ID and a matching name. For example league = list(id = 81, name = "CHL")
:
gameDetail.game$idName <- gameDetail %>% map('content') %>% set_names(gameDetail.game$single$gameId) %>% enframe("gameId", "data") %>% mutate( data = data %>% map(`[`, nm.game$idName) %>% map(enframe) ) %>% unnest() %>% rename(var = name) %>% mutate( value = value %>% map(bind_cols) ) %>% unnest() %>% gather("key", "value", -gameId, -var) %>% mutate(key = paste0(var, ut_firstToUpper(key))) %>% select(-var) %>% spread(key, value) %>% print()
That last item we will flatten to a table with one row per game is status
. This item is somewhat more complex, consisting of 7 fields. We prefix these seven fields as "statusX".
gameDetail.game$status <- gameDetail %>% map('content') %>% set_names(gameDetail.game$single$gameId) %>% enframe("gameId", "data") %>% mutate( data = data %>% map(`[`, "status") %>% map(bind_cols) ) %>% unnest() %>% mutate_at( vars(ends_with("DateTime")), ymd_hms ) %>% set_names(str_c("status", names(.) %>% ut_firstToUpper())) %>% print()
We join the three game items to a single data frame.
gameDetail.game <- gameDetail.game$single %>% full_join(gameDetail.game$idName, by = "gameId") %>% full_join(gameDetail.game$status, by = c(gameId = "statusGameId")) %>% print()
The remaining items of game detail content are more complex. Flattening them to a row-per-game table seems inappropriate. Doing so would substantially harm readability and handling of the data. We parse them into separate tables.
The summary
object contains event data on each period (fouls, goals, goalkeepers) and the shootout. We parse these data items into separate tables to take accound of the fact that we have vastly different information on the different events.
# Parse one summary$periods object parsePeriods <- function(periods) { ## Parse list tmp <- periods %>% enframe() %>% transmute( period = value %>% map_chr("name"), fouls = value %>% map("fouls") %>% map(~map_df(., bind_cols)), goals = value %>% map("goals") %>% map(~map_df(., bind_cols)), goalkeepers = value %>% map("goalkeepers") %>% map(~map_df(., bind_cols)) ) ## Prepare flat return list( fouls = tmp %>% select(period, fouls) %>% unnest(), goals = tmp %>% select(period, goals) %>% unnest(), goalkeepers = tmp %>% select(period, goalkeepers) %>% unnest() ) } tmp <- gameDetail %>% map('content') %>% set_names(gameDetail.game$gameId) %>% enframe("gameId", "data") %>% mutate( data = data %>% map("summary"), periods = data %>% map("periods") %>% map(parsePeriods), fouls = periods %>% map("fouls"), goals = periods %>% map("goals"), goalkeepers = periods %>% map("goalkeepers"), shootout = data %>% map("shootout") %>% map("shoots") %>% map(~map_df(., bind_cols)) ) %>% select(-data, -periods)
The fouls table provides IDs of the offending player and team, information on the penalty minutes awareded, the fould id and name (e.g., id = 9 representing 'Crosscheck'), and the start and effective end time of the penalty.
gameDetail.fouls <- tmp %>% select(gameId, fouls) %>% unnest() %>% mutate_at(vars(ends_with("time")), ms) %>% print()
The start and end time variables are parsed using lubridates ms()
. This function returns an S4 objct of class period. Differences can be obtained through simple subtraction. However, the differences are returned in somewhat unfriendly format (e.g. "2M -37S" instead of 83). These differences can easily be converted to seconds using as.numeric()
:
with(gameDetail.fouls, endTime - startTime) %>% as.numeric()
The sortOrder
field (also contained in the two other periods tables) is not yet understood. The field may contain rendering instructions for the SIHF website or some information that I have not yet decoded.
The goals table provides the id of the scoring and assisting players. In addition, it records all players on the ice as either plus or minuses.
gameDetail.goals <- tmp %>% select(gameId, goals) %>% unnest() %>% mutate_at(vars(ends_with("time")), ms) %>% print()
The goalkeepers
object is not fully understood yet. It appears to record goalkeeper substitution and pull events. It may be beneficial to parse the text
for IN / OUT. This necessicty would be somewhat surprising given that text fields of the SIHF API typically only render information contained in other fields.
gameDetail.goalkeepers <- tmp %>% select(gameId, goalkeepers) %>% unnest() %>% mutate_at(vars(ends_with("time")), ms) %>% print()
gameDetail.shootout <- tmp %>% select(gameId, shootout) %>% unnest() %>% print() rm(list = c("tmp", "parsePeriods"))
LineUp tables are recoreded in a somewhat cumbersome fashion. It is a nested list where:
- Home and away team lineups are separate nested lists
- Position is the next level of nesting. Values are:
- Defenders
- Forwarders
- Goalkeepers
- Other players
- Coach
- Captain
- Topscorer
- Different postions have unique types of data
- Defenders and forwards have line (1-4) and side (left, center, right)
- Goalkeepers and other players have line (if at all)
- Captain and topscorer is a single playerId
- Coach has a separate set of fields
Based on that, our ultimate target for parsing is a players table:
gameId teamId playerId position line side captain topscorer 2017xxx 96325 147852 defender 1 left TRUE FALSE 2017xxx 96325 258963 forwarder 2 right FALSE TRUE 2017xxx 96325 987521 goalkeeper 1 NA FALSE FALSE
Note that
- captain and topscorer are converted to logical
- line is filled by sequence of appearance to get a unique line id
- gameId is filled
- teamId is filled to complement "homeTeam" / "awayTeam" information
In addition, we parse "Coach" information separately since the items on coaches don't fit the players table.
The function parse.lineUp()
facilitates parsing the deeply nested list that consists the line up data. The function takes a single argument that is a gameDetail object (as returned by fetch_gameDetail()
) and returns a list of gameId, players and coaches line ups. Note that we extract a single element of gameDetail in the following code sample.
gameDetail[[1]] %>% parse.lineUp()
The parse.lineUp()
function can simply be mapped to lists with multiple gameDetail elements. The following block of code uses tibble::enframe()
to bind multiple games' gameDetail to a data frame. It then extracts flat tables of multi-game line ups for players and coaches separately.
tmp <- gameDetail %>% map(parse.lineUp) %>% enframe("idx", "data") %>% mutate( gameId = data %>% map_chr("gameId"), players = data %>% map("players"), coaches = data %>% map("coaches") ) gameDetail.lineUp.players <- tmp %>% select(gameId, players) %>% unnest() %>% print() gameDetail.lineUp.coaches <- tmp %>% select(gameId, coaches) %>% unnest() %>% print() rm(tmp)
To do: The position "otherPlayers" is not currently parsed because it is empty for all sample games currently considered.
parse.stats <- function(gameDetail) { # Parse stats tmp <- gd %$% content %$% stats %>% map( ~enframe(.) %>% spread(name, value) ) %>% bind_rows() %>% mutate_at( vars(alias, description, title), funs(map_chr(., unlist)) ) %>% mutate( colNames = header %>% map(~map_chr(., "alias")), data = map2( data, colNames, parse.stats.data ), team = alias %>% str_replace("(goalie|player)Stats", "") %>% tolower() %>% {sprintf("%sTeam", .)} ) playerStats <- tmp %>% filter(str_detect(alias, "^playerStats")) %>% select(team, data) %>% unnest() %>% mutate_at( vars( "playerNumber", "goals", "assists", "assists1", "assists2", "points", "pimTotal", "pimTotal", "plusMinus", "shotsOnGoal", "sogMissed", "sogPost", "blockings" ), parse_integer ) %>% mutate_at( vars("faceoffsPercent"), parse_percent, na = "-" ) %>% mutate_at( vars(starts_with("timeOnIce")), ms ) goalieStats <- tmp %>% filter(str_detect(alias, "^goalieStats")) %>% select(team, data) %>% unnest() %>% mutate_at( vars("playerNumber", "goalsAgainst", "saves", "shotsAgainst", "penaltyInMinutes"), parse_integer ) %>% mutate_at( vars("savesPercentage"), parse_percent, na = "-" ) %>% mutate_at( vars("goalsAgainstAverage"), parse_number, na = "-" ) %>% mutate_at( vars("secondsPlayed"), ms ) teams <- tmp %>% select(team, description) %>% distinct() teamStats <- tmp %>% filter(str_detect(alias, "^teamStats")) %>% select(data) %>% unnest() %>% gather("teamName", "value", -Statistic) %>% left_join(teams, by = c(teamName = "description")) %>% select(-teamName) %>% spread(Statistic, value) %>% mutate_at( vars(matches("%")), parse_percent ) %>% mutate_at( vars( starts_with("BkS"), matches("FO(W|L)? Total"), starts_with("PIM"), starts_with("PK "), "PPG", starts_with("PP "), starts_with("SH"), starts_with("SOG") ), parse_integer ) %>% mutate_at( vars("PKT", "PPT"), ms ) list( gameId = gameDetail$content$gameId, playerStats = playerStats, goalieStats = goalieStats, teamStats = teamStats ) } parse.stats(gameDetail[[1]]) # Todo: Some value parsing fails and throws a warning.
nm.complex <- c("summary", "lineUps", "players", "stats", "result", "details") gameDetail[[1]]$content %>% map(`[`, nm.complex) %>% map(str, max.level = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.