ohenery: The 'ohenery' package.

Description Harville and Henery models Breaking Changes Legal Mumbo Jumbo Note Author(s) References

Description

Modeling of ordinal outcomes via the softmax function under the Harville and Henery models.

Harville and Henery models

The Harville and Henery models describe the probability of ordered outcomes in terms of some parameters. Typically the ordered outcomes are things like place in a race, or winner among a large number of contestants. The Harville model could be described as a softmax probability for the first place finish, with a recursive model on the remaining places. The Henery model generalizes that to adjust the remaining places with another parameter.

These are best illustrated with an example. Suppose you observe a race of 20 contestants. Contestant number 11 takes first place, number 6 takes second place, and 17 takes third place, while the fourth through twentieth places are not recorded or not of interest. Under the Harville model, the probability of this outcome can be expressed as

\frac{μ_{11}}{∑_i μ_i} \frac{μ_6}{∑_{i \ne 11} μ_i} \frac{μ_{17}}{∑_{i \ne 11, i \ne 6} μ_i},

where μ_i = \exp{η_i}. In a softmax regression under the Harville model, one expresses the odds as η_i = x_i^{\top}β, where x_i are independent variables, for some β to be fit by the regression.

Under the Henery model, one adds gammas, γ_2, γ_3, ... such that the probability of the outcome above is

\frac{μ_{11}}{∑_i μ_i} \frac{μ_6^{γ_2}}{∑_{i \ne 11} μ_i^{γ_2}} \frac{μ_{17}^{γ_3}}{∑_{i \ne 11, i \ne 6} μ_i^{γ_3}}.

There is no reason to model a γ_1 as anything but one, since it would be redundant. The Henery softmax regression estimates the β as well as the γ_j. To simplify the regression, the higher order gammas are assumed to equal the last fit value. That is, we usually model γ_5=γ_4=γ_3.

The regression supports weighted estimation as well. The weights are applied to the places, not to the participants. The weighted likelihood under the example above, for the Harville model is

≤ft(\frac{μ_{11}}{∑_i μ_i}\right)^{w_1} ≤ft(\frac{μ_6}{∑_{i \ne 11} μ_i}\right)^{w_2} ≤ft(\frac{μ_{17}}{∑_{i \ne 11, i \ne 6} μ_i}\right)^{w_3}.

The weighting mechanism is how this package deals with unobserved places. Rather than marking all runners-up as tied for fourth place, in this case one sets the w_i=0 for i > 3. The regression is then not asked to make distinctions between the tied runners-up.

Breaking Changes

This package is a work in progress. Expect breaking changes. Please file any bug reports or issues at https://github.com/shabbychef/ohenery/issues.

Legal Mumbo Jumbo

ohenery is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

Note

This package is maintained as a hobby.

Author(s)

Steven E. Pav shabbychef@gmail.com

References

Harville, D. A. "Assigning probabilities to the outcomes of multi-entry competitions." Journal of the American Statistical Association 68, no. 342 (1973): 312-316. http://dx.doi.org/10.1080/01621459.1973.10482425

Henery, R. J. "Permutation probabilities as models for horse races." Journal of the Royal Statistical Society: Series B (Methodological) 43, no. 1 (1981): 86-91. http://dx.doi.org/10.1111/j.2517-6161.1981.tb01153.x


ohenery documentation built on Oct. 30, 2019, 9:53 a.m.