ohenery: The 'ohenery' package.
In shabbychef/ohenery: Modeling of Ordinal Random Variables via Softmax Regression

ohenery

R Documentation

The 'ohenery' package.

Description

Modeling of ordinal outcomes via the softmax function under the Harville and Henery models.

Harville and Henery models

The Harville and Henery models describe the probability of ordered outcomes in terms of some parameters. Typically the ordered outcomes are things like place in a race, or winner among a large number of contestants. The Harville model could be described as a softmax probability for the first place finish, with a recursive model on the remaining places. The Henery model generalizes that to adjust the remaining places with another parameter.

These are best illustrated with an example. Suppose you observe a race of 20 contestants. Contestant number 11 takes first place, number 6 takes second place, and 17 takes third place, while the fourth through twentieth places are not recorded or not of interest. Under the Harville model, the probability of this outcome can be expressed as

\frac{\mu_{11}}{\sum_i \mu_i} \frac{\mu_6}{\sum_{i \ne 11} \mu_i} \frac{\mu_{17}}{\sum_{i \ne 11, i \ne 6} \mu_i},

where \mu_i = \exp{\eta_i}. In a softmax regression under the Harville model, one expresses the odds as \eta_i = x_i^{\top}\beta, where x_i are independent variables, for some \beta to be fit by the regression.

Under the Henery model, one adds gammas, \gamma_2, \gamma_3, ... such that the probability of the outcome above is

\frac{\mu_{11}}{\sum_i \mu_i} \frac{\mu_6^{\gamma_2}}{\sum_{i \ne 11} \mu_i^{\gamma_2}} \frac{\mu_{17}^{\gamma_3}}{\sum_{i \ne 11, i \ne 6} \mu_i^{\gamma_3}}.

There is no reason to model a \gamma_1 as anything but one, since it would be redundant. The Henery softmax regression estimates the \beta as well as the \gamma_j. To simplify the regression, the higher order gammas are assumed to equal the last fit value. That is, we usually model \gamma_5=\gamma_4=\gamma_3.

The regression supports weighted estimation as well. The weights are applied to the places, not to the participants. The weighted likelihood under the example above, for the Harville model is

\left(\frac{\mu_{11}}{\sum_i \mu_i}\right)^{w_1} \left(\frac{\mu_6}{\sum_{i \ne 11} \mu_i}\right)^{w_2} \left(\frac{\mu_{17}}{\sum_{i \ne 11, i \ne 6} \mu_i}\right)^{w_3}.

The weighting mechanism is how this package deals with unobserved places. Rather than marking all runners-up as tied for fourth place, in this case one sets the w_i=0 for i > 3. The regression is then not asked to make distinctions between the tied runners-up.

Breaking Changes

This package is a work in progress. Expect breaking changes. Please file any bug reports or issues at https://github.com/shabbychef/ohenery/issues.

Legal Mumbo Jumbo

ohenery is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

Note

This package is maintained as a hobby.

Author(s)

Steven E. Pav shabbychef@gmail.com

References

Harville, D. A. "Assigning probabilities to the outcomes of multi-entry competitions." Journal of the American Statistical Association 68, no. 342 (1973): 312-316. http://dx.doi.org/10.1080/01621459.1973.10482425

Henery, R. J. "Permutation probabilities as models for horse races." Journal of the Royal Statistical Society: Series B (Methodological) 43, no. 1 (1981): 86-91. http://dx.doi.org/10.1111/j.2517-6161.1981.tb01153.x

shabbychef/ohenery documentation built on Oct. 19, 2023, 12:08 p.m.