hensm | R Documentation |
A user friendly interface to the softmax regression under the Henery model.
hensm(
formula,
data,
group = NULL,
weights = NULL,
ngamma = 4,
fit0 = NULL,
reg_wt = NULL,
reg_zero = NULL,
reg_power = NULL,
reg_coef_idx = NULL,
reg_standardize = FALSE,
na.action = na.omit
)
## S3 method for class 'hensm'
vcov(object, ...)
## S3 method for class 'hensm'
print(x, ...)
formula |
an object of class |
data |
an optional data frame, list or environment (or object
coercible by |
group |
the string name of the group variable in the data, or a bare character with the group name. The group indices need not be integers, but that is more efficient. They need not be sorted. |
weights |
an optional vector of weights, or the string or bare name of the
weights in the |
ngamma |
The number of gammas to fit. Should be at least 2. |
fit0 |
An optional object of class |
reg_wt |
the multiplicative weight(s) of the regularization terms.
must be non-negative.
May be a scalar or vector, but will be recycled to the length of the |
reg_zero |
the ‘zero’ of the regularization terms. This would
usually be zero if you want to shrink to zero, but in some cases you may
with to shrink to 1 for example.
May be a scalar or vector, but will be recycled to the length of the |
reg_power |
the power of the regularization terms, 2 for ridge regression, 1 for lasso. |
reg_coef_idx |
the index of the coefficient which the corresponding
regularization term is applied to. For the Harville model, the indices only refer to
the |
reg_standardize |
if true, the |
na.action |
How to deal with missing values in |
object |
an object of class |
... |
For |
x |
an object used to select a method. |
Performs a softmax regression by groups, via Maximum Likelihood Estimation.
It is assumed that successive sub-races maintain the proportional
probability of the softmax, up to some gamma coefficients,
\gamma_2, \gamma_3, ..., \gamma_n
, which we fit. This model
nests the Harville model fit by harsm
, by fixing all
the gammas equal to 1.
An object of class hensm
, but also of maxLik
with the
fit.
The regularization term is of the form
\sum_i w_i |\nu_{c_i} - z_i|^{p_i},
where w_i
are the reg_wt
weights,
z_i
are the reg_zero
zeroes,
p_i
are the reg_power
powers,
and c_i
are the reg_coef_idx
coefficient indices.
Note that the coefficient indices can be repeated so that the
regularization term can include multiple contributions from one
coefficient element. This allows ‘elasticnet’ regularization.
The \nu
here refer to the regression coefficients \beta
concatenated with the \gamma
coefficients in the Henery model.
The fit functions return an object of type maxLik
even when regularization penalties are applied; the statistical inference
functions are not valid when regularization is used. The user is warned.
This regression may give odd results when the outcomes are tied, imposing an arbitrary order on the tied outcomes. Moreover, no warning may be issued in this case. In future releases, ties may be dealt with differently, perhaps in analogy to how ties are treated in the Cox Proportional Hazards regression, using the methods of Breslow or Efron.
To avoid incorrect inference when only the top performers are recorded, and all others are effectively tied, one should use weighting. Set the weights to zero for participants who are tied non-winners, and one for the rest So for example, if you observe the Gold, Silver, and Bronze medal winners of an Olympic event that had a starting field of 12 participants, set weights to 1 for the medal winners, and 0 for the others. Note that the weights do not attach to the participants, they attach to the place they took.
Since version 0.1.0 of this package, the normalization of weights used in this function have changed under the hood. This is to give correct inference in the case where zero weights are used to signify finishing places were not observed. If in doubt, please confirm inference by simulations, taking as example the simulations in the README.
When regularization is used, the first gamma coefficient is not shrunk, as it always equals one in the Henery model, and is not estimated from the data.
Steven E. Pav shabbychef@gmail.com
harsm
, smlik
.
nfeat <- 5
set.seed(1234)
g <- ceiling(seq(0.1,1000,by=0.1))
X <- matrix(rnorm(length(g) * nfeat),ncol=nfeat)
beta <- rnorm(nfeat)
eta <- X %*% beta
# 2FIX: do rhenery
y <- rsm(eta,g)
# now the pretty frontend
data <- cbind(data.frame(outcome=y,race=g),as.data.frame(X))
fmla <- outcome ~ V1 + V2 + V3 + V4 + V5
fitm <- hensm(fmla,data,group=race)
# with offset
eta0 <- rowMeans(X)
data <- cbind(data.frame(outcome=y,race=g,eta0=eta0),as.data.frame(X))
fmla <- outcome ~ offset(eta0) + V1 + V2 + V3 + V4 + V5
fitm <- hensm(fmla,data,group=race)
# on horse race data
library(dplyr)
data(race_data)
df <- race_data %>%
group_by(EventId) %>%
mutate(eta0=log(WN_pool / sum(WN_pool))) %>%
ungroup() %>%
mutate(weights=ifelse(!is.na(Finish),1,0)) %>%
mutate(fac_age=cut(Age,c(0,3,5,7,Inf),include.lowest=TRUE))
# Henery Model with market efficiency
hensm(Finish ~ eta0,data=df,group=EventId,weights=weights,ngamma=3)
# look for age effect not captured by consensus odds.
fmla <- Finish ~ offset(eta0) + fac_age
fit0 <- hensm(fmla,data=df,group=EventId,weights=weights,ngamma=2)
# allow warm start.
fit1 <- hensm(fmla,data=df,group=EventId,weights=weights,fit0=fit0,ngamma=2)
# allow warm start with more gammas.
fit2 <- hensm(fmla,data=df,group=EventId,weights=weights,fit0=fit0,ngamma=3)
# or a different formula
fit3 <- hensm(update(fmla,~ . + PostPosition),data=df,group=EventId,weights=weights,fit0=fit0)
# warm start from harsm object
fit0_har <- harsm(fmla,data=df,group=EventId,weights=weights)
fit4 <- hensm(fmla,data=df,group=EventId,fit0=fit0_har,weights=weights)
# regularization examples
fmla <- Finish ~ offset(eta0) + fac_age + PostPosition
# no regularization
fitr0 <- hensm(fmla,data=df,group=EventId,weights=weights,ngamma=3)
# ridge regression on the betas (there are two)
fitr2 <- hensm(fmla,data=df,group=EventId,weights=weights,ngamma=3,
reg_wt=rep(1,2), reg_power=2, reg_zero=0, reg_coef_idx=c(1,2))
# l1 regression on the betas (there are two)
fitr1 <- hensm(fmla,data=df,group=EventId,weights=weights,ngamma=3,
reg_wt=rep(1,2), reg_power=1, reg_zero=0, reg_coef_idx=c(1,2))
# elasticnet regression on the betas (there are two)
fitr12 <- hensm(fmla,data=df,group=EventId,weights=weights,ngamma=3,
reg_wt=rep(1,4), reg_power=c(1,1,2,2), reg_zero=0, reg_coef_idx=c(1,2,1,2))
# l1 regression on the gammas, shrinking to one (there are two estimated)
fitrg1 <- hensm(fmla,data=df,group=EventId,weights=weights,ngamma=3,
reg_wt=rep(1,2), reg_power=1, reg_zero=1, reg_coef_idx=c(3,4))
# l2 regression on the betas and gammas, shrinking beta to 0, gamma to 1.
fitrg2 <- hensm(fmla,data=df,group=EventId,weights=weights,ngamma=3,
reg_wt=rep(1,4), reg_power=2, reg_zero=c(0,0,1,1), reg_coef_idx=1:4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.