harsm: Friendly interface to softmax regression under Harville...
In ohenery: Modeling of Ordinal Random Variables via Softmax Regression

harsm

R Documentation

Friendly interface to softmax regression under Harville model.

Description

A user friendly interface to the softmax regression under the Harville model.

Usage

harsm(
  formula,
  data,
  group = NULL,
  weights = NULL,
  fit0 = NULL,
  reg_wt = NULL,
  reg_zero = 0,
  reg_power = NULL,
  reg_coef_idx = NULL,
  reg_standardize = FALSE,
  na.action = na.omit
)

## S3 method for class 'harsm'
vcov(object, ...)

## S3 method for class 'harsm'
print(x, ...)

Arguments

`formula`	an object of class `"formula"` (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’.
`data`	an optional data frame, list or environment (or object coercible by `as.data.frame` to a data frame) containing the variables in the model. If not found in `data`, the variables are taken from `environment(formula)`, typically the environment from which `lm` is called.
`group`	the string name of the group variable in the data, or a bare character with the group name. The group indices need not be integers, but that is more efficient. They need not be sorted.
`weights`	an optional vector of weights, or the string or bare name of the weights in the `data` for use in the fitting process. The weights are attached to the outcomes, not the participant. Set to `NULL` for none.
`fit0`	An optional object of class `harsm` or of `hensm` with the initial fit estimates. These will be used for ‘warm start’ of the estimation procedure. A warm start should only speed up estimation, not change the ultimate results. When there is mismatch between the coefficients in `fit0` and the model being fit here, the missing coefficients are initialized as zero.
`reg_wt`	the multiplicative weight(s) of the regularization terms. must be non-negative. May be a scalar or vector, but will be recycled to the length of the `reg_coef_idx`.
`reg_zero`	the ‘zero’ of the regularization terms. This would usually be zero if you want to shrink to zero, but in some cases you may with to shrink to 1 for example. May be a scalar or vector, but will be recycled to the length of the `reg_coef_idx`. If `NULL` is given, defaults to zeroes for beta terms, and ones for gamma terms in a Henery model fit, all zeroes for Harville model fits.
`reg_power`	the power of the regularization terms, 2 for ridge regression, 1 for lasso.
`reg_coef_idx`	the index of the coefficient which the corresponding regularization term is applied to. For the Harville model, the indices only refer to the `\beta` coefficient vector. For the Henery model, the indices refer to the `beta` coefficient vector and `\gamma` coefficient vector concatenated together. The other regularization parameters may be recycled as needed, but not the `reg_coef_idx` values.
`reg_standardize`	if true, the `reg_wt` are normalized, or ‘standardized’ with respect to the standard deviation of the corresponding columns of the design matrix. That is, the weight used is the given weight times the standard deviation of the corresponding independent variable to the corresponding `reg_power`. Only terms associated with the betas are so normalized.
`na.action`	How to deal with missing values in the outcomes, groups, weights, etc.
`object`	an object of class `harsm`.
`...`	For `lm()`: additional arguments to be passed to the low level regression fitting functions (see below).
`x`	an object used to select a method.

Details

Performs a softmax regression by groups, via Maximum Likelihood Estimation, under the Harville model. We fit \beta where odds are \eta = x^{\top}\beta for independent variables x. The probability of taking first place is then \mu=c\exp{\eta}, where the c is chosen so the \mu sum to one. Under the Harville model, conditional on the first place finisher having been observed, the probability model for second (and successive) places with the probabilities of the remaining participants renormalized.

The print method of the harsm object includes a display of the R-squared. This measures the improvement in squared errors of the expected rank from the model over the null model which posits that all odds are equal. When the formula includes an offset, a ‘delta R-squared’ is also output. This is the improvement in predicted ranks over the model based on the offset term. Note that the expected ranks are only easy to produce under the Harville model; under the Henery model, the summary R-squared statistics are not produced. Note that this computation uses weighted sums of squares, as the weights contribute to the likelihood term. However, the square sum computation does not take into account the standard error of the rank, and so unlike in linear regression, the softmax regression does not always give positive R-squareds, and the statistic is otherwise hard to interpret.

Value

An object of class harsm, but also of maxLik with the fit.

Regularization

The regularization term is of the form

\sum_i w_i |\nu_{c_i} - z_i|^{p_i},

where w_i are the reg_wt weights, z_i are the reg_zero zeroes, p_i are the reg_power powers, and c_i are the reg_coef_idx coefficient indices. Note that the coefficient indices can be repeated so that the regularization term can include multiple contributions from one coefficient element. This allows ‘elasticnet’ regularization. The \nu here refer to the regression coefficients \beta concatenated with the \gamma coefficients in the Henery model.

Note

The fit functions return an object of type maxLik even when regularization penalties are applied; the statistical inference functions are not valid when regularization is used. The user is warned.

This regression may give odd results when the outcomes are tied, imposing an arbitrary order on the tied outcomes. Moreover, no warning may be issued in this case. In future releases, ties may be dealt with differently, perhaps in analogy to how ties are treated in the Cox Proportional Hazards regression, using the methods of Breslow or Efron.

To avoid incorrect inference when only the top performers are recorded, and all others are effectively tied, one should use weighting. Set the weights to zero for participants who are tied non-winners, and one for the rest So for example, if you observe the Gold, Silver, and Bronze medal winners of an Olympic event that had a starting field of 12 participants, set weights to 1 for the medal winners, and 0 for the others. Note that the weights do not attach to the participants, they attach to the place they took.

Since version 0.1.0 of this package, the normalization of weights used in this function have changed under the hood. This is to give correct inference in the case where zero weights are used to signify finishing places were not observed. If in doubt, please confirm inference by simulations, taking as example the simulations in the README.

Author(s)

Steven E. Pav shabbychef@gmail.com

Examples


nfeat <- 5
set.seed(1234)
g <- ceiling(seq(0.1,1000,by=0.1))
X <- matrix(rnorm(length(g) * nfeat),ncol=nfeat)
beta <- rnorm(nfeat)
eta <- X %*% beta
y <- rsm(eta,g)
# now the pretty frontend
data <- cbind(data.frame(outcome=y,race=g),as.data.frame(X))

fmla <- outcome ~ V1 + V2 + V3 + V4 + V5
fitm <- harsm(fmla,data,group=race)

eta0 <- rowMeans(X)
data <- cbind(data.frame(outcome=y,race=g,eta0=eta0),as.data.frame(X))
fmla <- outcome ~ offset(eta0) + V1 + V2 + V3 + V4 + V5
fitm <- harsm(fmla,data,group=race)

# with weights
data <- cbind(data.frame(outcome=y,race=g,eta0=eta0),as.data.frame(X))
data$wts <- runif(nrow(data),min=1,max=2)
fmla <- outcome ~ offset(eta0) + V1 + V2 + V3 + V4 + V5
fitm <- harsm(fmla,data,group=race,weights=wts)

# softmax on the Best Picture data
data(best_picture)
df <- best_picture
df$place <- ifelse(df$winner,1,2)
df$weight <- ifelse(df$winner,1,0)

fmla <- place ~ nominated_for_BestDirector + nominated_for_BestActor + Drama
fit0 <- harsm(fmla,data=df,group=year,weights=weight)

# warm start is a thing:
sub_fmla <- place ~ nominated_for_BestDirector + nominated_for_BestActor
fit1 <- harsm(sub_fmla,data=df,group=year,weights=weight,fit0=fit0)

# ridge regression.
fitr2 <- harsm(sub_fmla,data=df,group=year,weights=weight,
reg_wt=rep(1,2), reg_power=2, reg_zero=0, reg_coef_idx=c(1,2))

# l1 regularization regression.
fitr1 <- harsm(sub_fmla,data=df,group=year,weights=weight,
reg_wt=rep(1,2), reg_power=1, reg_zero=0, reg_coef_idx=c(1,2))

# elasticnet regularization regression.
fitr12 <- harsm(sub_fmla,data=df,group=year,weights=weight,
reg_wt=rep(1,4), reg_power=c(1,1,2,2), reg_zero=0, reg_coef_idx=c(1,2,1,2))


# test against logistic regression
if (require(dplyr)) {
nevent <- 10000
set.seed(1234)
adf <- data_frame(eventnum=floor(seq(1,nevent + 0.7,by=0.5))) %>%
  mutate(x=rnorm(n()),
         program_num=rep(c(1,2),nevent),
         intercept=as.numeric(program_num==1),
         eta=1.5 * x + 0.3 * intercept,
         place=ohenery::rsm(eta,g=eventnum))

# Harville model
modh <- harsm(place ~ intercept + x,data=adf,group=eventnum)

# the collapsed data.frame for glm
ddf <- adf %>%
  arrange(eventnum,program_num) %>%
  group_by(eventnum) %>%
    summarize(resu=as.numeric(first(place)==1),
              delx=first(x) - last(x),
              deli=first(intercept) - last(intercept)) %>%
  ungroup()

# glm logistic fit
modg <- glm(resu ~ delx + 1,data=ddf,family=binomial(link='logit'))

all.equal(as.numeric(coef(modh)),as.numeric(coef(modg)),tolerance=1e-4)
all.equal(as.numeric(vcov(modh)),as.numeric(vcov(modg)),tolerance=1e-4)

}

ohenery documentation built on Sept. 9, 2025, 5:56 p.m.