This package functions that make make it easy to analyze betting lines for NFL games.
You can install the development version of NFLpredictions from GitHub with:
# install.packages("devtools")
devtools::install_github("przybylee/NFLpredictions")
Suppose we are considering placing a wager on a first round playoff game between the Cardinals and the Rams.
First we load the library and collect some data. We will scrape scores from all the games from the 2021 regular season from Pro-Football Reference.
library(NFLpredictions)
G <- scrape_games(ssn = 2021, wk_start = 1, wk_stop = 18)
head(G)
#> Team Score Year Week
#> new Dallas Cowboys 29 2021 1
#> new.1 Tampa Bay Buccaneers 31 2021 1
#> new.2 Philadelphia Eagles 32 2021 1
#> new.3 Atlanta Falcons 6 2021 1
#> new.4 Pittsburgh Steelers 23 2021 1
#> new.5 Buffalo Bills 16 2021 1
tail(G)
#> Team Score Year Week
#> new.538 Seattle Seahawks 38 2021 18
#> new.539 Arizona Cardinals 30 2021 18
#> new.540 San Francisco 49ers 27 2021 18
#> new.541 Los Angeles Rams 24 2021 18
#> new.542 Los Angeles Chargers 32 2021 18
#> new.543 Las Vegas Raiders 35 2021 18
It is important to note that since there were 18 weeks of the 2021
regular season we set wk_stop = 18
. If the value of this parameter is
larger than the number of weeks played, the function may produce an
error. A nice feature of Profootball Reference is that the playoff weeks
are a continuation of the week numbers used in the regular season. This
means for 2021, week 19 is the wild card round, week 20 is the
divisional round, week 21 is the conference championships, etc.
Next, we want to use linear regression to estimate the relative strengths of each team. We can use a simple model, where Yj is the margin of victory for the home team in game j. Then we have
yj = τhj − τaj + η + εj,
where εj is observed as the residuals in our model. The indices hj and aj indicate the home and away teams respectively. The effect of home field advantage is represented by η. The matrix equation associated with this is
Xβ* = y*,
where
β = (η,τ1,...,τ32)′
The design matrix X has a row for each game in the data set, where
each row has a 1 in the first column, a 1 in column hj,
and a − 1 in column aj. We cannot estimate β
directly, but we can estimate η as well as the difference of any two
τ’s using ordinary least squares. This serves as a good predictor for
the outcome of a game. Given a data frame of games resembling G
, we
can produce the appropriate X and y using the function
get_design()
.
data <- get_design(G)
names(data)
#> [1] "X" "X_sum" "Y_diff" "Y_sum" "Y_binary" "teams" "games"
The object data
is a list that contains X and y. There is also a
list of the team names appearing in the original data set.Now that the
data has been sorted, we can analyze the game of interest. For starters,
we use point_spread_ols
to estimate the Rams’ margin of victory.
point_spread_ols(data, "Rams", "Cardinals", a = 0.05)
#> [1] "Los Angeles Rams beat Arizona Cardinals by 1.8"
#> est std.dev lower upper
#> 1 1.844102 4.444196 -6.910509 10.59871
Notice that since we set a = 0.05
, we get the limits of a 95%
confidence interval for
τRams − τCardinals + η. This is based on the
t statistic when we assume
εj ∼ N(0,σ2).
We can also use use this assumption to get win probabilities.
probs <- winprob_ols(data, "Rams", "cardinals")
#> [1] "The estimated win probability for the Arizona Cardinals at Los Angeles Rams is 0.445"
probs
#> h a h_spread h_prob a_prob method
#> 1 Los Angeles Rams Arizona Cardinals 1.844102 0.5546997 0.4453003 normal
If we wanted to look at the chances of a team beating the spread, we
could use spreadprob_ols()
.
It is in the interest of the sports book to set the betting lines so
each wager will have negative expectation, although they also consider
the amount of risk based on where customers are placing their bets. We
would like to consider the expected value of each of our wagers. In the
long run, we could benefit from only placing wagers with positive
expectation. To see the expected value on wagering on either of the
moneylines, we use eML_ols()
.
eML_ols(data, "Rams", "Cardinals", wager = 40, hBL = -170, aBL = 150)
#> h a eHome eAway method
#> 1 Los Angeles Rams Arizona Cardinals -4.760257 4.530034 normal
We can also consider the expected value of betting on either team to beat the spread.
eSpread_ols(data, "Rams", "Cardinals", hspread = -3.5, wager = 40)
#> h a eHome eAway method
#> 1 Los Angeles Rams Arizona Cardinals -5.571235 1.934871 normal
If left null, the value for aspread
, the home teams spread is assumed
to be the opposit of hspread
. The default values of hBL
and aBL
are -110, since that is usually the standard betting line for something
with equal odds after vigorish is applied.
After considering the probabities, betting on the Cardinals moneyline seems to have the best payout relative to our estimated probability of this actually happening. This is still the least likely wager to pay off.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.