hensm: Friendly interface to softmax regression under Henery model.

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/hensm.r

Description

A user friendly interface to the softmax regression under the Henery model.

Usage

1
2
3
4
5
6
7
8
hensm(formula, data, group = NULL, weights = NULL, ngamma = 4,
  na.action = na.omit)

## S3 method for class 'hensm'
vcov(object, ...)

## S3 method for class 'hensm'
print(x, ...)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lm is called.

group

the string name of the group variable in the data, or a bare character with the group name. The group indices need not be integers, but that is more efficient. They need not be sorted.

weights

an optional vector of weights, or the string or bare name of the weights in the data for use in the fitting process. The weights are attached to the outcomes, not the participant. Set to NULL for none.

ngamma

The number of gammas to fit. Should be at least 2.

na.action

How to deal with missing values in y, g, X, wt, eta0.

object

an object of class hensm.

...

additional arguments to be passed to the low level regression fitting functions (see below).

x

logicals. If TRUE the corresponding components of the fit (the model frame, the model matrix, the response, the QR decomposition) are returned.

Details

Performs a softmax regression by groups, via Maximum Likelihood Estimation. It is assumed that successive sub-races maintain the proportional probability of the softmax, up to some gamma coefficients, γ_2, γ_3, ..., γ_n, which we fit. This model nests the Harville model fit by harsm, by fixing all the gammas equal to 1.

Value

An object of class hensm, but also of maxLik with the fit.

Note

This regression may give odd results when the outcomes are tied, imposing an arbitrary order on the tied outcomes. Moreover, no warning may be issued in this case. In future releases, ties may be dealt with differently, perhaps in analogy to how ties are treated in the Cox Proportional Hazards regression, using the methods of Breslow or Efron.

To avoid incorrect inference when only the top performers are recorded, and all others are effectively tied, one should use weighting. Set the weights to zero for participants who are tied non-winners, and one for the rest So for example, if you observe the Gold, Silver, and Bronze medal winners of an Olympic event that had a starting field of 12 participants, set weights to 1 for the medal winners, and 0 for the others. Note that the weights do not attach to the participants, they attach to the place they took.

Since version 0.1.0 of this package, the normalization of weights used in this function have changed under the hood. This is to give correct inference in the case where zero weights are used to signify finishing places were not observed. If in doubt, please confirm inference by simulations, taking as example the simulations in the README.

Author(s)

Steven E. Pav shabbychef@gmail.com

See Also

harsm, smlik.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
nfeat <- 5
set.seed(1234)
g <- ceiling(seq(0.1,1000,by=0.1))
X <- matrix(rnorm(length(g) * nfeat),ncol=nfeat)
beta <- rnorm(nfeat)
eta <- X %*% beta
# 2FIX: do rhenery
y <- rsm(eta,g)
# now the pretty frontend
data <- cbind(data.frame(outcome=y,race=g),as.data.frame(X))

fmla <- outcome ~ V1 + V2 + V3 + V4 + V5
fitm <- hensm(fmla,data,group=race)

# with offset
eta0 <- rowMeans(X)
data <- cbind(data.frame(outcome=y,race=g,eta0=eta0),as.data.frame(X))
fmla <- outcome ~ offset(eta0) + V1 + V2 + V3 + V4 + V5
fitm <- hensm(fmla,data,group=race)

# on horse race data
library(dplyr)
data(race_data)
df <- race_data %>%
	group_by(EventId) %>%
		mutate(eta0=log(WN_pool / sum(WN_pool))) %>%
	ungroup() %>%
	mutate(weights=ifelse(!is.na(Finish),1,0)) %>%
	mutate(fac_age=cut(Age,c(0,3,5,7,Inf),include.lowest=TRUE)) 

# Henery Model with market efficiency
hensm(Finish ~ eta0,data=df,group=EventId,weights=weights,ngamma=3)

# look for age effect not captured by consensus odds.
hensm(Finish ~ offset(eta0) + fac_age,data=df,group=EventId,weights=weights,ngamma=2)

ohenery documentation built on Oct. 30, 2019, 9:53 a.m.