hockey: NHL hockey data

Description Details Value Author(s) References See Also Examples

Description

Every NHL goal from fall 2002 through the 2014 cup finals.

Details

The data comprise of information about play configuration and the players on ice (including goalies) for every goal from 2002-03 to 2012-14 NHL seasons. Collected using A. C. Thomas's nlhscrapr package. See the Chicago hockey analytics project at github.com/mataddy/hockey.

Value

goal

Info about each goal scored, including homegoal – an indicator for the home team scoring.

player

Sparse Matrix with entries for who was on the ice for each goal: +1 for a home team player, -1 for an away team player, zero otherwise.

team

Sparse Matrix with indicators for each team*season interaction: +1 for home team, -1 for away team.

config

Special teams info. For example, S5v4 is a 5 on 4 powerplay, +1 if it is for the home-team and -1 for the away team.

Author(s)

Matt Taddy, mataddy@gmail.com

References

Gramacy, Jensen, and Taddy (2013): "Estimating Player Contribution in Hockey with Regularized Logistic Regression", the Journal of Quantitative Analysis in Sport.

Gramacy, Taddy, and Tian (2015): "Hockey Player Performance via Regularized Logistic Regression", the Handbook of statistical methods for design and analysis in sports.

See Also

gamlr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
## design 
data(hockey)
x <- cbind(config,team,player)
y <- goal$homegoal

## fit the plus-minus regression model
## (non-player effects are unpenalized)

fit <- gamlr(x, y, 
  lambda.min.ratio=0.05, nlambda=40, ## just so it runs in under 5 sec
  free=1:(ncol(config)+ncol(team)),
  standardize=FALSE, family="binomial")
plot(fit)

## look at estimated player [career] effects
B <- coef(fit)[colnames(player),]
sum(B!=0) # number of measurable effects (AICc selection)
B[order(-B)[1:10]] # 10 biggest

## convert to 2013-2014 season partial plus-minus
now <- goal$season=="20132014"
pm <- colSums(player[now,names(B)]*c(-1,1)[y[now]+1]) # traditional plus minus
ng <- colSums(abs(player[now,names(B)])) # total number of goals
# The individual effect on probability that a
# given goal is for vs against that player's team
p <- 1/(1+exp(-B)) 
# multiply ng*p - ng*(1-p) to get expected plus-minus
ppm <- ng*(2*p-1)

# organize the data together and print top 20
effect <- data.frame(b=round(B,3),ppm=round(ppm,3),pm=pm)
effect <- effect[order(-effect$ppm),]
print(effect[1:20,])

Example output

Loading required package: Matrix
[1] 620
PETER_FORSBERG   ONDREJ_PALAT  TYLER_TOFFOLI ZIGMUND_PALFFY  SIDNEY_CROSBY 
     0.7506064      0.6035498      0.5999503      0.4229641      0.4087186 
  JOE_THORNTON  PAVEL_DATSYUK  LOGAN_COUTURE      ERIC_FEHR   MATT_MOULSON 
     0.3808053      0.3696573      0.3616907      0.3613557      0.3510730 
                        b    ppm pm
ONDREJ_PALAT        0.604 37.496 38
SIDNEY_CROSBY       0.409 31.847 52
HENRIK_LUNDQVIST    0.162 26.746  9
JONATHAN_TOEWS      0.301 24.060 35
ANDREI_MARKOV       0.274 23.707 34
TYLER_TOFFOLI       0.600 21.847 31
JOE_THORNTON        0.381 21.824 34
ANZE_KOPITAR        0.241 21.700 39
RYAN_NUGENT-HOPKINS 0.282 18.768 18
GABRIEL_LANDESKOG   0.260 18.379 36
PAVEL_DATSYUK       0.370 18.092 13
LOGAN_COUTURE       0.362 17.353 29
ALEX_OVECHKIN       0.300 16.389 16
MARIAN_HOSSA        0.261 15.681 21
DAVID_PERRON        0.273 15.186  2
ALEXANDER_SEMIN     0.349 15.040 -1
MATT_MOULSON        0.351 14.595 22
MIKKO_KOIVU         0.262 14.057 12
FRANS_NIELSEN       0.289 14.053  8
JONATHAN_BERNIER    0.128 13.317 22

gamlr documentation built on July 1, 2020, 5:18 p.m.

Related to hockey in gamlr...