poissb | R Documentation |
This function analyzes the provided data on basketball matches and, based on the number of points accumulated by a team, returns winning probabilities of home and away teams through the algorithm of Poisson distribution.
poissb(df, season = NULL, year = NULL, team, opponent)
df |
A data frame. Contains the historical basketball data of an interest. Teams' names and the end scores of games should be strictly present in the data. |
season |
Boolean. Indicates whether the data is nested among several seasons. If TRUE, the data is shortly thereafter subsetted to obtain the predictions resting on the outcomes of a particular season. A season is then specified under the argument "Year". If FALSE, the predictions are made with regards to the initial number of observations. |
year |
Optional. Accepts the season/year, based on the actual data of which the predictions are made. Can be applied only in case "season" = TRUE. Otherwise, running the function will result in an error message. |
team |
A character string. Accepts the club name of a Home team, as identified in a data frame. |
opponent |
A character string. Accepts the club name of an Away team, as identified in a data frame. |
poissb() is designed in a way to adjust itself once the structure of data frame is given. Nonetheless, for the sake of user convenience and avoidance of bulky number of arguments that user should specify, the code has been simplified and constrained in what can be imported to it.
The function will process the data only if 4 (without Year specification) or 5 (with) columns are supplied. Column names of a data frame should be of following labels: "Home_Team", "Away_Team", "Home_PTS", "Away_PTS", "Year" (optional). Also note that the order of the data frame labels should be preserved as illustrated above. If either of the abovementioned or both are not abided by, running the function will result in an error message.
The essence of the prediction is quite simple: the underlying core within the process is a Generalized Linear Model (glm) of a "poisson" family. The regression is undertaken to arrive at the points scored mean for a team given that it plays a particular opponent; nothing but the opponent specification and home advantage are accounted for in deriving the estimates.
The Poisson probabilities are then computed within a range from lower bound of 0 to upper bound of 180 points per game. The probability of draw, although not of a great significance and possibility in basketball, is allowed for by adding up to winning probabilities of each of the teams as a minor weight.
A vector with respective probabilities of each of teams' victory against the other.
data(nba2009_2016)
test <- nba2009_2016[, c(1,4,5,7,8)]
test1 <- data.frame("Home_Team" = test$home.TEAM_NAME, "Away_Team" = test$away.TEAM_NAME,
"Home_PTS" = test$home.PTS, "Away_PTS" = test$away.PTS, "Year" = test$SEASON_ID)
test2 <- data.frame("Home_Team" = test$home.TEAM_NAME, "Away_Team" = test$away.TEAM_NAME,
"Home_PTS" = test$home.PTS, "Away_PTS" = test$away.PTS)
poissb(df = test1, season = T, year = 22010, team = "Cleveland Cavaliers", opponent = "Los Angeles Lakers")
poissb(df = test2, season = F, team = "Cleveland Cavaliers", opponent = "Los Angeles Lakers")
Erroneous specifications
by Season/Year discrepancy:
poissb(df = test1, season = TRUE, team = "Cleveland Cavaliers", opponent = "Los Angeles Lakers")
poissb(df = test2, season = FALSE, year = 22010, team = "Cleveland Cavaliers", opponent = "Los Angeles Lakers")
by ncol discrepancy:
poissb(df = nba2009_2016, year = 22010, team = "Cleveland Cavaliers", opponent = "Los Angeles Lakers")
poissb(df = nba2009_2016, season = FALSE, team = "Cleveland Cavaliers", opponent = "Los Angeles Lakers")
by colnames discrepancy:
poissb(df = iris, season = TRUE, year = 22010, team = "Cleveland Cavaliers", opponent = "Los Angeles Lakers")
Author: Yuri Shahnazaryan
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.