MRSP: Multinomial Response Models with Structured Penalties
In MRSP: Multinomial Response Models with Structured Penalties

Description Usage Arguments Details Value Author(s) References Examples

Fit models with multinomial response (including ordinal regression) with structured penalties.

MRSP(formula, data, class.names = NULL, model = multinomlogit(), constr = NULL,
     offset = NULL, weights = NULL, penweights = NULL, standardize = TRUE,
     nrlambda = 50, lambdamin = 0.01, lambdamax = NULL, control = NULL,
     penalty = TRUE, group.classes = TRUE, group.dummies = TRUE,
     sparse.groups = FALSE, adaptive = FALSE, threshold = FALSE, refit = FALSE,
     lambda, lambdaR = 0, lambdaF = 0, gamma = 1, psi = 1, fusion = FALSE,
     nonneg = FALSE, y = NULL, X = NULL, Z = NULL, penindex = NULL,
     grpindex = NULL, mlfit = NULL, perform.fit = TRUE, ...)

`formula`	A symbolic description of the model to be fitted. The left-hand side specifies the response, which must be a categorical variable with `K` categories. The right-hand-side specifies the covariate structure. See details below.
`data`	A data frame containing the variables in `formula`. It must be in long format. This means that `K` rows are required for each individual observation. For details and a possibility to convert to this format, see `mlogit.data` from the mlogit package. The response variable must be either a 0-1-vector or a logical vector, with 1 or TRUE indicating the observed class/category/alternative.
`class.names`	An optional character vector of length `K` specifying the names of the response classes/categories.
`model`	An object of class `MRSP.model` that specifies the model to be used. Currently, `model = multinomlogit()` and `model = sequentiallogit()` are available, yielding a multinomial or sequential logit model, respectively. Cumulative logit models will be included in future versions of `MRSP`.
`constr`	The identifiability constraint to be used. The coefficients of predictors which do not vary over categories (i.e. global/individual-specific predictors) are not identifiable in (unpenalized) multinomial logit models. Either an integer in `[1,K]` or `"symmetric"` or `"none"`. See details below.
`offset`	An optional vector of offsets. Must be of appropriate length.
`weights`	An optional vector of observations weights. Must be of appropriate length.
`penweights`	An object containing weights that modify the penalty terms on different parameters. See `MRSP.fit` for details.
`standardize`	If `TRUE`, predictors are mean-centered and standardized to unit variance. The reported coefficients, by default, correspond to the original covariates.
`nrlambda, lambdamin, lambdamax`	If a sequence of `lambda` values is not specified explicitly via argument `lambda`, `MRSP` computes a suitable grid of length `nrlambda`, ranging from `lambdamin` to `lambdamax`. If `lambdamax` is missing, `MRSP` tries to find a suitable value for it.
`control`	An object of class `MRSP.control` that stores control information. It's slots `max.iter` and `rel.tol` specify the max number of iterations and the relative change in penalized log-likelihood values that indicates convergence. The other slots should not be changed unless by experienced users.
`penalty`	Specifies the general type of penalty to be applied. FALSE means no penalty is used. `TRUE` means that a lasso-type penalty is applied. `"Ridge"` applies a ridge penalty. Arguments `group.classes`, `group.dummies` and `sparse.groups` have no effect unless `penalty = TRUE`.
`group.classes`	If `TRUE`, lasso-type penalties will be grouped across all coefficients belonging to the same covariate. This corresponds to the ‘CATS Lasso’ penalty proposed in Tutz, Poessnecker and Uhlmann (2015).
`group.dummies`	If entries on the rhs of `formula` refer to factor variables with more than 2 levels, several dummy variables will enter the model that are related to the same actual covariate. Setting this argument to `TRUE` will yield penalty terms that treat all corresponding coefficients as one parameter group. This corresponds to the original ‘Group Lasso’ of Yuan and Lin (2006).
`sparse.groups, gamma`	If `TRUE`, parameter groups will also be penalized by an L1 penalty on top of the unsquared L2 penalty. If the L2 penalty uses a tuning parameter of value lambda, the L1 penalty will use tuning parameter lambda*gamma.
`adaptive`	Should adaptive weights be used? Use `adaptive="ML"` to obtain the traditional adaptive weights proposed in the literature. Using `adaptive = TRUE` computes the penalized estimator with whatever penalty is specified and no adaptive weights and computes adaptive weights from the output of this penalized model. The final output is the computed with those adaptive weights. It is strongly recommended to prefer `adaptive="ML"` over `adaptive=TRUE`.
`threshold`	If `TRUE`, the coefficients will be thresholded with an appropriate threshold value. You can also specify an explicit nonnegative value to be used as the threshold.
`refit`	Should refitting be performed? If `TRUE`, the model is first fit traditionally, and then refitted on the active set found by this first fit. This can improve variable selection, but tends to be rather slow and memory-consuming.
`lambda`	Tuning parameter(s) to be used. Can be a nonnegative scalar or vector. If missing, `MRSP` computes a suitable grid of lambda values, see also `nrlambda`.
`lambdaR`	Lambda value(s) for ridge penalties. Same structure as `lambda`.
`lambdaF`	Lambda values(s) for fusion penalties. Same structure as `lambda`. Not yet supported for end-users of `MRSP`, but included for compatibility with future releases of `MRSP`.
`psi`	A numeric that balances the weighting of penalties on global and class-/category-/alternative-specific predictors (if present) in multinomial logit models. Penalties on global predictors are weighted with `psi`, penalties on class-specific predictors are weighted with `(2-psi)`.
`fusion`	If fusion penalties are used, this specifies the type of fusion. Not yet supported for end-users of `MRSP`, but included for compatibility with future releases of `MRSP`.
`nonneg`	If `TRUE`, all coefficients are restricted to be nonnegative.
`y, X, Z, penindex, grpindex, mlfit`	These optional arguments allow to supply the corresponding objects directly to `MRSP.fit`. It is highly recommended not to use those arguments and to use `formula` instead for the model specification.
`perform.fit`	If TRUE, the model is fitted. If FALSE, function `MRSP` prepares the call to `MRSP.fit` and returns a list from which this call can be accessed.
`...`	Further arguments to be passed.

For model = multinomlogit(), a formula of the form “Y ~ x | z1 | z2” yields linear predictors

eta_r = beta_0r + x ' beta_r + z1_r ' gamma + z2_r ' delta_r

for r=1,…,K, which are connected to probabilities by the multinomial logit link, also known as softmax function:

P(Y = r | x, z1, z2) = exp(eta_r) / sum[ exp(eta_s)]_{s=1,...,K}.

This means that the x-variables have global values that are class-/category-/alternative-unspecific. The coefficients belonging to those global variables are not identifiable in the generic form of the multinomial logit model as given above. Therefore, an identifiability constraint can be specified with argument constr: By setting constr = r (with r in [1,K]), category r is chosen as reference category. Technically, this means setting beta_r = 0. Alternatively, constr="symmetric" specifies a so-called symmetric side constraint, which technically means imposing that

sum[ beta_sj ]_{s=1,...,K} = 0,

for all j=1,…,p. If constr = "none", no constraint is used for penalized parameter groups and identifiability is ensured by the penalty term (see Friedman, Hastie and Tibshirani, 2010). Coefficients of an unpenalized x-variable are subject to a symmetric side constraint in this case.
The z1- and z2-variables are class-/category-/alternative-specific. The z1-variables have a global effect while the category-specific z2-variables are equipped with coefficients that are also category-specific. Note that no identifiability constraints are required for category-specific variables.

For model = sequentiallogit(), a formula of the form “Y ~ x1 | x2” yields linear predictors of the form

eta_r = beta_0r + x1 ' alpha + x2 ' beta_r

for r=1,…,K-1, which are connected to conditional probabilities via the logit link in the following way:

P(Y = r | Y >= r, x1, x2) = exp(eta_r) / (1 + exp(eta_r))

for r=1,…,K-1.
Note that in the sequential case, slot "mu" of an MRSP-class object contains the unconditional class probabilities P(Y = r). If you want to get the “discrete” hazard rates P(Y = r | Y >= r), use methods fitted or predict with argument convert2hazard=TRUE.

Depending on nrlambda, either an object of class MRSP or of class MRSP.list, which are lists of length nrlambda whose elements are MRSP objects. Additionally, the attributes “topcall”, “call” and “dat” store, respectively, the call to MRSP, the call to MRSP.fit which was prepared by MRSP and the data object to be supplied to MRSP.fit. Note that dat contains the standardized covariates if standardize = TRUE.

Wolfgang Poessnecker

Tutz, G., Poessnecker, W. and Uhlmann, L. (2015): Variable Selection in General Multinomial Logit Models
Computational Statistics and Data Analysis, Vol. 82, 207-222.
http://www.sciencedirect.com/science/article/pii/S0167947314002709

Friedman, J., Hastie, T. and Tibshirani, R. (2010): Regularization Paths for Generalized Linear Models via Coordinate Descent, http://www.stanford.edu/~hastie/Papers/glmnet.pdf
Journal of Statistical Software, Vol. 33(1), 1-22.
http://www.jstatsoft.org/v33/i01/

Yuan, M. and Lin, Y. (2006): Model selection and estimation in regression with grouped variables
Journal of the Royal Statistical Society Series B, Vol. 68(1), 49-67.

## load data
data(TravelMode, package="MRSP")
## bring the response variable to the form required by MRSP
TravelMode$choice <- ifelse(TravelMode$choice=="yes",1,0)

## construct a list of fitted models for different lambda values. 
## income is a global predictor, wait is a class-specific predictor with global
## coefficients, vcost and travel are specified as class-specific predictors 
## with class-specific coefficients. The fourth category ("car") is chosen as 
## reference. 
fit <- MRSP(choice~income|wait|vcost+travel, data=TravelMode, constr=4,
            class.names=levels(TravelMode$mode), lambdamax=150, nrlambda=10,
            group.classes=TRUE, sparse.groups=FALSE, adaptive="ML")
            
fit

## slots can be extracted from all elements via function 'extract':
BICs <- extract(fit, "BIC")
lambdagrid <- extract(fit, "lambda")


## to select a concrete lambda/model, one can use function 'select'. Here, we
## chose the best model according to its AIC value
bestfit <- select(fit, "AIC")
bestfit

## some methods:
summary(bestfit)
BIC(bestfit)
fitted(bestfit)[1:6,]
bestfit@coef
## get the coefficients belonging to standardized predictors:
coef(bestfit, type="stand")
residuals(bestfit)
predict(bestfit, newdata = TravelMode[1:40,c(4,5,6,8)])

## plot some coefficient paths:
par(mfrow=c(1,2))
## you can either specify the number of the variable...
plot(fit, 2, legendpars=list(x="bottomright"))
## ... or its name as a character string. lcex is the cex parameter for legends.
## set it to zero to disable legend plotting. 
plot(fit,"travel", lambda = bestfit@lambda, lcex=0)