gmnl: Estimate Multinomial Logit Models with Observed and...
In gmnl: Multinomial Logit Models with Random Parameters

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/gmnl.R

Estimate different types of multinomial logit models with observed and unobserved individual heterogneity, such as MIXL, S-MNL, G-MNL, LC and MM-MNL models. These models are estimated using Maximum Simulated Likelihood. It supports both cross-sectional and panel data.

gmnl(
  formula,
  data,
  subset,
  weights,
  na.action,
  model = c("mnl", "mixl", "smnl", "gmnl", "lc", "mm"),
  start = NULL,
  ranp = NULL,
  R = 40,
  Q = 2,
  haltons = NA,
  mvar = NULL,
  seed = 12345,
  correlation = FALSE,
  bound.err = 2,
  panel = FALSE,
  hgamma = c("direct", "indirect"),
  reflevel = NULL,
  init.tau = 0.1,
  init.gamma = 0.1,
  notscale = NULL,
  print.init = FALSE,
  gradient = TRUE,
  typeR = TRUE,
  ...
)

## S3 method for class 'gmnl'
print(
  x,
  digits = max(3, getOption("digits") - 3),
  width = getOption("width"),
  ...
)

## S3 method for class 'gmnl'
summary(object, ...)

## S3 method for class 'summary.gmnl'
print(
  x,
  digits = max(3, getOption("digits") - 2),
  width = getOption("width"),
  ...
)

## S3 method for class 'gmnl'
update(object, new, ...)

## S3 method for class 'gmnl'
coef(object, ...)

## S3 method for class 'gmnl'
model.matrix(object, ...)

## S3 method for class 'gmnl'
residuals(object, outcome = TRUE, ...)

## S3 method for class 'gmnl'
df.residual(object, ...)

## S3 method for class 'gmnl'
fitted(object, outcome = TRUE, ...)

## S3 method for class 'gmnl'
logLik(object, ...)

## S3 method for class 'gmnl'
nObs(x, ...)

`formula`	a symbolic description of the model to be estimated. The formula is divided in five parts, each of them separated by the symbol `\|`. The first part is reserved for alternative-specific variables with a generic coefficient. The second part corresponds to individual-specific variables with an alternative specific coefficients. The third part corresponds to alternative-specific variables with an alternative-specific coefficident. The fourth part is reserved for time-invariant variables that modify the mean of the random parameters. Finally, the fifth part is reserved for time-invariant variables that enter in the scale coefficient or in the probability assignment in models with latent classes.
`data`	the data of class `mlogit.data`.
`subset`	an optional vector specifying a subset of observations.
`weights`	an optional vector of weights. Default to 1.
`na.action`	a function wich indicated what should happen when the data contains `NA`'s.
`model`	a string indicating which model is estimated. The options are "`mnl`" for the Multinomial Logit Model, "`mixl`" for the Mixed Logit Model, "`smnl`" for the Scaled Multinomial Logit Model, "`gmnl`" for the Generalized Multinomial Logit Model, "`lc`" for the Latent Class Multinomial Logit Model, and "`mm`" for the Mixed-Mixed Multinomial Logit Model.
`start`	a vector of starting values.
`ranp`	a named vector whose names are the random parameters and values the distribution: "`n`" for normal, "`ln`" for log-normal, "`cn`" for truncated normal, "`u`" for uniform, "`t`" for triangular, "`sb`" for Sb Johnson.
`R`	the number of draws of pseudo-random numbers if `ranp` is not `NULL`.
`Q`	number of classes for LC or MM-MNL models.
`haltons`	only relevant if `ranp` is not `NULL`. If `haltons = NULL`, pseudo-random numbers are used instead of Halton sequences. If `haltons=NA`, the first K primes are used to generates the Halton draws, where K is the number of random parameters, and 15 of the initial sequence of elements are dropped. Otherwise, `haltons` should be a list with elements `prime` and `drop`.
`mvar`	only valid if `ranp` is not `NULL`. This is a named list, where the names correspond to the variables with random parameters, and the values correspond to the variables that enter in the mean of each random parameters.
`seed`	seed for the random number generator. Default is `seed = 12345`.
`correlation`	only relevant if `ranp` is not `NULL`. If true, the correlation across random parameters is taken into account.
`bound.err`	only relevenat if model is `smnl` or `gmnl`. It indicates at which values the draws for the scale parameter are truncated. By default `bound.err = 2`, therefore a truncated normal distribution with truncation at -2 and +2 is used.
`panel`	if `TRUE` a panel data model is estimated.
`hgamma`	a string indicated how to estimate the parameter gamma. If "`direct`", then γ is estimated directly, if "`indirect`" then γ ^* is estimated, where γ = \exp(γ^)/(1 + \exp(γ^)).
`reflevel`	the base alternative.
`init.tau`	initial value for the τ parameter.
`init.gamma`	initial value for γ.
`notscale`	only relevant if model is `smnl` or `gmnl`. It is a vector indicating which variables should not be scaled.
`print.init`	if `TRUE`, the initial values for the optimization procedure are printed.
`gradient`	if `TRUE`, analytical gradients are used for the optimization procedure.
`typeR`	if `TRUE`, truncated normal draws are used for the scale parameter, if `FALSE` the procedure suggested by Greene (2010) is used.
`...`	additional arguments to be passed to `maxLik`, which depend in the maximization routine.
`x, object`	and object of class `gmnl`.
`digits`	the number of digits.
`width`	width.
`new`	an updated formula for the `update` method.
`outcome`	if `TRUE`, then the `fitted` and `residuals` methods return a vector that corresponds to the chosen alternative, otherwise it returns a matrix where each column corresponds to each alternative.

Let the utility to person i from choosing alternative j on choice occasion t be:

U_{ijt} = β_{i}x_{ijt} + ε_{ijt}

where ε_{ijt} is i.i.d extreme value, and β_i vary across individuals. Each model estimated by gmnl depends on how β_i is specified. The options are the following:

S-MNL if β_i=σ_iβ, where the scale σ_i varies across individuals.
MIXL if β_i=β + sη_i, where η_i is a draw from some distribution. For example, if β_i\sim N(β, s^2), then η_i\sim N(0, 1).
GMNL if β_i=σ_iβ + γ sη_i + σ_i(1-γ)sη_i, where σ_i is the scale parameter, and γ is a parameter that controls how the variance of residual taste heterogeneity varies with scale.
LC if β_i=β_q with probability w_{iq} for q = 1,...,Q, where Q is the total number of classes.
MM-MIXL if β_i\sim f(β_q, Σ_q) with probability w_{iq} for q = 1,...,Q, where Q is the total number of classes.

Observed heterogeneity can be also accommodated in the random parameters when the MIXL is estimated by including individual-specific covariates. Specifically, the vector of random coefficients is

β_i=β +Π z_i + Lη_i

where z_i is a set of characteristics of individual i that influence the mean of the taste parameters; and Π is matrix of parameters. To estimate this model, the fourth part of the formula should be specified along with the mvar argument.

One can also allow the mean of the scale to differ across individuals by including individual-specific characteristics. Thus, the scale parameters can be written as

\exp(\bar{σ} + δ h_i + τ υ_i)

where h_i is a vector of attributes of individual i. To estimate this model, the fifth part of the formula should include the variables that enter h_i.

For models with latent classes, the class assignment is modeled as a semi-parametric multinomial logit format

w_{iq}= \frac{\exp(γ_q)}{∑_{q=1}^Q\exp(γ_q)}

for q = 1,...,Q, γ_1 = 0. Latent class models (LC and MM-MIXL) requires at least that a constant should be specified in the fifth part of the formula. If the class assignment, w_{iq}, is also determined by socio-economic characteristics, these variables can be also included in the fifth part.

Models that involve random parameters are estimated using Maximum Simulated Likelihood using the maxLik function of maxLik package.

An object of class “gmnl” with the following elements

`coefficients`	the named vector of coefficients,
`logLik`	a set of values of the maximum likelihood procedure,
`mf`	the model framed used,
`formula`	the formula (a `gFormula` object),
`time`	`proc.time()` minus the start time,
`freq`	frequency of dependent variable,
`draws`	type of draws used,
`model`	the fitted model,
`R`	number of draws used,
`ranp`	vector indicating the variables with random parameters and their distribution,
`residuals`	the residuals,
`correlation`	whether the model is fitted assuming that the random parameters are correlated,
`bi`	matrix of conditional expectation of random parameters,
`Q`	number of classes,
`call`	the matched call.

Mauricio Sarrias msarrias86@gmail.com

Keane, M., & Wasi, N. (2013). Comparing alternative models of heterogeneity in consumer choice behavior. Journal of Applied Econometrics, 28(6), 1018-1045.
Fiebig, D. G., Keane, M. P., Louviere, J., & Wasi, N. (2010). The generalized multinomial logit model: accounting for scale and coefficient heterogeneity. Marketing Science, 29(3), 393-421.
Greene, W. H., & Hensher, D. A. (2010). Does scale heterogeneity across individuals matter? An empirical assessment of alternative logit models. Transportation, 37(3), 413-428.
Train, K. (2009). Discrete choice methods with simulation. Cambridge University Press.

mlogit, mlogit.data, maxLik, Rchoice

## Examples using the Fishing data set from the AER package
data("TravelMode", package = "AER")
library(mlogit)
TM <- mlogit.data(TravelMode, choice = "choice", shape = "long", 
                 alt.levels = c("air", "train", "bus", "car"), chid.var = "individual")
## Not run: 
## S-MNL model, ASCs not scaled
smnl <- gmnl(choice ~ wait + vcost + travel + gcost| 1, data = TM, 
             model = "smnl", R = 100, 
             notscale = c(1, 1, 1, rep(0, 4)))
summary(smnl)

## MIXL model with observed heterogeneity
mixl.hier <- gmnl(choice ~ vcost + gcost + travel + wait | 1 | 0 | income + size - 1,
                 data = TM,
                 model = "mixl",
                 ranp = c(travel = "t", wait = "n"),
                 mvar = list(travel = c("income","size"), wait = c("income")),
                 R = 30,
                 haltons = list("primes"= c(2, 17), "drop" = rep(19, 2)))
summary(mixl.hier)

## Examples using the Electricity data set from the mlogit package
data("Electricity", package = "mlogit")
Electr <- mlogit.data(Electricity, id.var = "id", choice = "choice",
                     varying = 3:26, shape = "wide", sep = "")
                     
## Estimate a MIXL model with correlated random parameters
Elec.cor <- gmnl(choice ~ pf + cl + loc + wk + tod + seas| 0, data = Electr,
                 subset = 1:3000,
                 model = 'mixl',
                 R = 10,
                 panel = TRUE,
                 ranp = c(cl = "n", loc = "n", wk = "n", tod = "n", seas = "n"),
                 correlation = TRUE)
summary(Elec.cor)
cov.gmnl(Elec.cor)
se.cov.gmnl(Elec.cor)
se.cov.gmnl(Elec.cor, sd = TRUE)
cor.gmnl(Elec.cor)

## Estimate a G-MNL model, where ASCs are also random
Electr$asc2 <- as.numeric(Electr$alt == 2)
Electr$asc3 <- as.numeric(Electr$alt == 3)
Electr$asc4 <- as.numeric(Electr$alt == 4)

Elec.gmnl <- gmnl(choice ~ pf + cl + loc + wk + tod + seas + asc2 + asc3 + asc4 | 0,
                 data = Electr,
                 subset = 1:3000,
                 model = 'gmnl',
                 R = 30,
                 panel = TRUE,
                 notscale = c(rep(0, 6), 1, 1, 1),
                 ranp = c(cl = "n", loc = "n", wk = "n", tod = "n", seas = "n",
                 asc2 = "n", asc3 = "n", asc4 = "n"))
summary(Elec.gmnl)

## Estimate a LC model with 2 classes
Elec.lc <- gmnl(choice ~ pf + cl + loc + wk + tod + seas| 0 | 0 | 0 | 1,
               data = Electr,
               subset = 1:3000,
               model = 'lc',
               panel = TRUE,
               Q = 2)
summary(Elec.lc)

## Estimate a MM-MIXL model
Elec.mm <- gmnl(choice ~ pf + cl + loc + wk + tod + seas| 0 | 0 | 0 | 1,
                 data = Electr,
                 subset = 1:3000,
                 model = 'mm',
                 R = 30,
                 panel = TRUE,
                 ranp = c(pf = "n", cl = "n", loc = "n", wk = "n", tod = "n",
                 seas = "n"),
                 Q = 2,
                 iterlim = 500)
summary(Elec.mm)

## End(Not run)