logitr: The main function for estimating logit models

View source: R/logitr.R

logitrR Documentation

The main function for estimating logit models

Description

Use this function to estimate multinomial (MNL) and mixed logit (MXL) models with "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations. The function includes an option to run a multistart optimization loop with random starting points in each iteration, which is useful for non-convex problems like MXL models or models with WTP space utility parameterizations. The main optimization loop uses the nloptr() function to minimize the negative log-likelihood function.

Usage

logitr(
  data,
  outcome,
  obsID,
  pars,
  scalePar = NULL,
  randPars = NULL,
  randScale = NULL,
  modelSpace = NULL,
  weights = NULL,
  panelID = NULL,
  clusterID = NULL,
  robust = FALSE,
  correlation = FALSE,
  startValBounds = c(-1, 1),
  startVals = NULL,
  numMultiStarts = 1,
  useAnalyticGrad = TRUE,
  scaleInputs = TRUE,
  standardDraws = NULL,
  drawType = "halton",
  numDraws = 50,
  numCores = NULL,
  vcov = FALSE,
  predict = TRUE,
  options = list(print_level = 0, xtol_rel = 1e-06, xtol_abs = 1e-06, ftol_rel = 1e-06,
    ftol_abs = 1e-06, maxeval = 1000, algorithm = "NLOPT_LD_LBFGS"),
  price,
  randPrice,
  choice,
  parNames,
  choiceName,
  obsIDName,
  priceName,
  weightsName,
  clusterName,
  cluster
)

Arguments

data

The data, formatted as a data.frame object.

outcome

The name of the column that identifies the outcome variable, which should be coded with a 1 for TRUE and 0 for FALSE.

obsID

The name of the column that identifies each observation.

pars

The names of the parameters to be estimated in the model. Must be the same as the column names in the data argument. For WTP space models, do not include the scalePar variable in pars.

scalePar

The name of the column that identifies the scale variable, which is typically "price" for WTP space models, but could be any continuous variable, such as "time". Defaults to NULL.

randPars

A named vector whose names are the random parameters and values the distribution: 'n' for normal, 'ln' for log-normal, or 'cn' for zero-censored normal. Defaults to NULL.

randScale

The random distribution for the scale parameter: 'n' for normal, 'ln' for log-normal, or 'cn' for zero-censored normal. Only used for WTP space MXL models. Defaults to NULL.

modelSpace

This argument is no longer needed as of v0.7.0. The model space is now determined based on the scalePar argument: if NULL (the default), the model will be in the preference space, otherwise it will be in the WTP space. Defaults to NULL.

weights

The name of the column that identifies the weights to be used in model estimation. Defaults to NULL.

panelID

The name of the column that identifies the individual (for panel data where multiple observations are recorded for each individual). Defaults to NULL.

clusterID

The name of the column that identifies the cluster groups to be used in model estimation. Defaults to NULL.

robust

Determines whether or not a robust covariance matrix is estimated. Defaults to FALSE. Specification of a clusterID or weights will override the user setting and set this to ‘TRUE’ (a warning will be displayed in this case). Replicates the functionality of Stata's cmcmmixlogit.

correlation

Set to TRUE to account for correlation across random parameters (correlated heterogeneity). Defaults to FALSE.

startValBounds

sets the lower and upper bounds for the starting parameter values for each optimization run, which are generated by runif(n, lower, upper). Defaults to c(-1, 1).

startVals

is vector of values to be used as starting values for the optimization. Only used for the first run if numMultiStarts > 1. Defaults to NULL.

numMultiStarts

is the number of times to run the optimization loop, each time starting from a different random starting point for each parameter between startValBounds. Recommended for non-convex models, such as WTP space models and mixed logit models. Defaults to 1.

useAnalyticGrad

Set to FALSE to use numerically approximated gradients instead of analytic gradients during estimation. For now, using the analytic gradient is faster for MNL models but slower for MXL models. Defaults to TRUE.

scaleInputs

By default each variable in data is scaled to be between 0 and 1 before running the optimization routine because it usually helps with stability, especially if some of the variables have very large or very small values (e.g. ⁠> 10^3⁠ or ⁠< 10^-3⁠). Set to FALSE to turn this feature off. Defaults to TRUE.

standardDraws

By default, a new set of standard normal draws are generated during each call to logitr (the same draws are used during each multistart iteration). The user can override those draws by providing a matrix of standard normal draws if desired. Defaults to NULL.

drawType

Specify the draw type as a character: "halton" (the default) or "sobol" (recommended for models with more than 5 random parameters).

numDraws

The number of Halton draws to use for MXL models for the maximum simulated likelihood. Defaults to 50.

numCores

The number of cores to use for parallel processing of the multistart. Set to 1 to serially run the multistart. Defaults to NULL, in which case the number of cores is set to parallel::detectCores() - 1. Max cores allowed is capped at parallel::detectCores().

vcov

Set to TRUE to evaluate and include the variance-covariance matrix and coefficient standard errors in the returned object. Defaults to FALSE.

predict

If FALSE, predicted probabilities, fitted values, and residuals are not included in the returned object. Defaults to TRUE.

options

A list of options for controlling the nloptr() optimization. Run nloptr::nloptr.print.options() for details.

price

No longer used as of v0.7.0 - if provided, this is passed to the scalePar argument and a warning is displayed.

randPrice

No longer used as of v0.7.0 - if provided, this is passed to the randScale argument and a warning is displayed.

choice

No longer used as of v0.4.0 - if provided, this is passed to the outcome argument and a warning is displayed.

parNames

No longer used as of v0.2.3 - if provided, this is passed to the pars argument and a warning is displayed.

choiceName

No longer used as of v0.2.3 - if provided, this is passed to the outcome argument and a warning is displayed.

obsIDName

No longer used as of v0.2.3 - if provided, this is passed to the obsID argument and a warning is displayed.

priceName

No longer used as of v0.2.3 - if provided, this is passed to the scalePar argument and a warning is displayed.

weightsName

No longer used as of v0.2.3 - if provided, this is passed to the weights argument and a warning is displayed.

clusterName

No longer used as of v0.2.3 - if provided, this is passed to the clusterID argument and a warning is displayed.

cluster

No longer used as of v0.2.3 - if provided, this is passed to the clusterID argument and a warning is displayed.

Details

The the options argument is used to control the detailed behavior of the optimization and must be passed as a list, e.g. options = list(...). Below are a list of the default options, but other options can be included. Run nloptr::nloptr.print.options() for more details.

Argument Description Default
xtol_rel The relative x tolerance for the nloptr optimization loop. 1.0e-6
xtol_abs The absolute x tolerance for the nloptr optimization loop. 1.0e-6
ftol_rel The relative f tolerance for the nloptr optimization loop. 1.0e-6
ftol_abs The absolute f tolerance for the nloptr optimization loop. 1.0e-6
maxeval The maximum number of function evaluations for the nloptr optimization loop. 1000
algorithm The optimization algorithm that nloptr uses. "NLOPT_LD_LBFGS"
print_level The print level of the nloptr optimization loop. 0

Value

The function returns a list object containing the following objects.

Value Description
coefficients The model coefficients at convergence.
logLik The log-likelihood value at convergence.
nullLogLik The null log-likelihood value (if all coefficients are 0).
gradient The gradient of the log-likelihood at convergence.
hessian The hessian of the log-likelihood at convergence.
probabilities Predicted probabilities. Not returned if predict = FALSE.
fitted.values Fitted values. Not returned if predict = FALSE.
residuals Residuals. Not returned if predict = FALSE.
startVals The starting values used.
multistartNumber The multistart run number for this model.
multistartSummary A summary of the log-likelihood values for each multistart run (if more than one multistart was used).
time The user, system, and elapsed time to run the optimization.
iterations The number of iterations until convergence.
message A more informative message with the status of the optimization result.
status An integer value with the status of the optimization (positive values are successes). Use statusCodes() for a detailed description.
call The matched call to logitr().
inputs A list of the original inputs to logitr().
data A list of the original data provided to logitr() broken up into components used during model estimation.
numObs The number of observations.
numParams The number of model parameters.
freq The frequency counts of each alternative.
modelType The model type, 'mnl' for multinomial logit or 'mxl' for mixed logit.
weightsUsed TRUE or FALSE for whether weights were used in the model.
numClusters The number of clusters.
parSetup A summary of the distributional assumptions on each model parameter ("f"="fixed", "n"="normal distribution", "ln"="log-normal distribution").
parIDs A list identifying the indices of each parameter in coefficients by a variety of types.
scaleFactors A vector of the scaling factors used to scale each coefficient during estimation.
standardDraws The draws used during maximum simulated likelihood (for MXL models).
options A list of options for controlling the nloptr() optimization. Run nloptr::nloptr.print.options() for details.

Examples

# For more detailed examples, visit
# https://jhelvy.github.io/logitr/articles/

library(logitr)

# Estimate a MNL model in the Preference space
mnl_pref <- logitr(
  data    = yogurt,
  outcome = "choice",
  obsID   = "obsID",
  pars    = c("price", "feat", "brand")
)

# Estimate a MNL model in the WTP space, using a 5-run multistart
mnl_wtp <- logitr(
  data           = yogurt,
  outcome        = "choice",
  obsID          = "obsID",
  pars           = c("feat", "brand"),
  scalePar       = "price",
  numMultiStarts = 5
)

# Estimate a MXL model in the Preference space with "feat"
# following a normal distribution
# Panel structure is accounted for in this example using "panelID"
mxl_pref <- logitr(
  data     = yogurt,
  outcome  = "choice",
  obsID    = "obsID",
  panelID  = "id",
  pars     = c("price", "feat", "brand"),
  randPars = c(feat = "n")
)

logitr documentation built on Sept. 11, 2024, 6:40 p.m.