ohenery | R Documentation |
Modeling of ordinal outcomes via the softmax function under the Harville and Henery models.
The Harville and Henery models describe the probability of ordered outcomes in terms of some parameters. Typically the ordered outcomes are things like place in a race, or winner among a large number of contestants. The Harville model could be described as a softmax probability for the first place finish, with a recursive model on the remaining places. The Henery model generalizes that to adjust the remaining places with another parameter.
These are best illustrated with an example. Suppose you observe a race of 20 contestants. Contestant number 11 takes first place, number 6 takes second place, and 17 takes third place, while the fourth through twentieth places are not recorded or not of interest. Under the Harville model, the probability of this outcome can be expressed as
\frac{\mu_{11}}{\sum_i \mu_i} \frac{\mu_6}{\sum_{i \ne 11} \mu_i}
\frac{\mu_{17}}{\sum_{i \ne 11, i \ne 6} \mu_i},
where \mu_i = \exp{\eta_i}
.
In a softmax regression under the Harville model,
one expresses the odds as \eta_i = x_i^{\top}\beta
, where
x_i
are independent variables, for some
\beta
to be fit by the regression.
Under the Henery model, one adds gammas, \gamma_2, \gamma_3, ...
such
that the probability of the outcome above is
\frac{\mu_{11}}{\sum_i \mu_i} \frac{\mu_6^{\gamma_2}}{\sum_{i \ne 11} \mu_i^{\gamma_2}}
\frac{\mu_{17}^{\gamma_3}}{\sum_{i \ne 11, i \ne 6} \mu_i^{\gamma_3}}.
There is no reason to model a \gamma_1
as anything but one,
since it would be redundant.
The Henery softmax regression estimates the \beta
as well as
the \gamma_j
.
To simplify the regression, the higher order gammas are assumed to equal
the last fit value. That is, we usually model
\gamma_5=\gamma_4=\gamma_3
.
The regression supports weighted estimation as well. The weights are applied to the places, not to the participants. The weighted likelihood under the example above, for the Harville model is
\left(\frac{\mu_{11}}{\sum_i \mu_i}\right)^{w_1} \left(\frac{\mu_6}{\sum_{i \ne 11} \mu_i}\right)^{w_2}
\left(\frac{\mu_{17}}{\sum_{i \ne 11, i \ne 6} \mu_i}\right)^{w_3}.
The weighting mechanism is how this package deals with unobserved
places.
Rather than marking all runners-up as tied for fourth place, in this
case one sets the w_i=0
for i > 3
.
The regression is then not asked to make distinctions between the
tied runners-up.
This package is a work in progress. Expect breaking changes. Please file any bug reports or issues at https://github.com/shabbychef/ohenery/issues.
ohenery is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
This package is maintained as a hobby.
Steven E. Pav shabbychef@gmail.com
Maintainer: Steven E. Pav shabbychef@gmail.com (ORCID)
Harville, D. A. "Assigning probabilities to the outcomes of multi-entry competitions." Journal of the American Statistical Association 68, no. 342 (1973): 312-316. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/01621459.1973.10482425")}
Henery, R. J. "Permutation probabilities as models for horse races." Journal of the Royal Statistical Society: Series B (Methodological) 43, no. 1 (1981): 86-91. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/j.2517-6161.1981.tb01153.x")}
Useful links:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.