A win probability for volleyball matches. The basic framework is as follows:
Attacks are classified by the team attacking and the system (in-system, out of system, transition net play (e.g. dig overpass), reception overpass).
Transition probabilities between the current attack and the next attack (or winning/losing the point before another attack) are estimated.
A Markov chain model is used to estimate the "long run" probability of winning a point given the current attack. This probability is hard-coded as the "average" value of an in-system, out-of-system, transition net play, or reception overpass attack. The Markov chain model implicitly assumes that the current attack can only influence the next attack (not the attack after that, etc.).
A multinomial regression model is built to estimate the probability of each possible next-attack outcome (OI = Opponent In-System; ON = Opponent Transition Net Play; OO = Opponent Out-of-System; OW = Opponent Wins without another attack; TI, TN, TO, TW are the same for the touching team) for each attack, dig, and set. Current predictors in the model:
Overpasses/Net Play Attacks: system (in-system, out of system, transition net play, reception overpass), number of blockers (double, none, seam, solo, triple, unknown), attack_start_zone (as normal for a 6-zone division of the half-court, e.g., Zone 1 and Zone 9 are combined as "Back Right").
It is important to note here that the only features we include are known immediately prior to the ball being touched; that is, our features are based on a discussion of "What makes this attack/set/dig more or less likely to be successful?".
For touches that are not modeled:
Freeball: We don't think anything other than team/opponent quality affects a team's next attack off a freeball, and we haven't built that model yet. For now, a single number estimated from data.
Block: We decided not to model this directly, primarily because we don't believe the blocking skill of a player is adequately captured in the Block touches. First, a very good pin blocker may be placed on an island or set away from (or conversely, a poor blocker may be covered by commit-blocks or targeted with sets). Second, especially in the women's collegiate game, attackers rarely go after a well-formed block, so many easy digs are a direct result of a block that doesn't get any credit. Third, when multiple blockers are present, it's not clear how much credit the players who didn't touch the ball should get (or who those players even are). We include a (very slow) function to estimate the opposing blockers on an attack and thus give the entire front line credit/blame; we are still debating how to properly distribute it.
For each modeled touch, the coefficients of the models are used to estimate the probability of the next attack being one of the six transition possibilities or winning/losing the point.
The input point-win probability (input_pwp) is calculated as a weighted average of the "average values" for each of the outcome possibilities, where the weights are the probabilities calculated in Step 6. For freeballs and serves, the input_pwp is hard-coded based on Step 5.
The output point-win probability is the input_pwp of the next attack, dig, set, or freeball (or 1 or 0 if a team wins/loses the point on that touch).
Right now, we fill in receptions and blocks with the input_pwp and output_pwp mirrors of the previous serve/attack. This gets a little weird with situations where a block is not immediately preceded by an attack (e.g., blocking a freeball).
The point win probability difference (pwp_diff) is calculated as the output_pwp - input_pwp.
Stuff still to do:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.