The IWLS algorithm used to fit conditional logit models
In mclogit: Multinomial Logit Models for Categorical Responses and Discrete Choices

The package "mclogit" fits conditional logit models using a maximum likelihood estimator. It does this by maximizing the log-likelihood function using an iterative weighted least-squares (IWLS) algorithm, which follows the algorithm used by the glm.fit() function from the "stats" package of R [@nelder.wedderburn:glm;@mccullagh.nelder:glm.2ed;@Rcore].

If $\pi_{ij}$ is the probability that individual $i$ chooses alternative $j$ from his/her choice set $\mathcal{S}_i$, where

$$ \pi_{ij}=\frac{\exp(\eta_{ij})}{\sum_k{\in\mathcal{S}i}\exp(\eta{ik})} $$

and if $y_{ij}$ is the dummy variable with equals 1 if individual $i$ chooses alternative $j$ and equals 0 otherwise, the log-likelihood function (given that the choices are identically independent distributed given $\pi_{ij}$) can be written as

$$ \ell=\sum_{i,j}y_{ij}\ln\pi_{ij} =\sum_{i,j}y_{ij}\eta_{ij}-\sum_i\ln\left(\sum_j\exp(\eta_{ij})\right) $$

If the data are aggregated in the terms of counts such that $n_{ij}$ is the number of individuals with the same choice set and the same choice probabilities $\pi_{ij}$ that have chosen alternative $j$, the log-likelihood is (given that the choices are identically independent distributed given $\pi_{ij}$)

$$ \ell=\sum_{i,j}n_{ij}\ln\pi_{ij} =\sum_{i,j}n_{ij}\eta_{ij}-\sum_in_{i+}\ln\left(\sum_j\exp(\eta_{ij})\right) $$

where $n_{i+}=\sum_{j\in\mathcal{S}i}n{ij}$.

$$ \eta_{ij} = \alpha_1x_{1ij}+\cdots+\alpha_rx_{rij}=\boldsymbol{x}_{ij}'\boldsymbol{\alpha} $$

then the gradient of the log-likelihood with respect to the coefficient vector $\boldsymbol{\alpha}$ is

$$ \frac{\partial\ell}{\partial\boldsymbol{\alpha}} = \sum_{i,j} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}} \frac{\partial\ell}{\partial\eta_{ij}} = \sum_{i,j} \boldsymbol{x}{ij} (n{ij}-n_{i+}\pi_{ij}) = \sum_{i,j} \boldsymbol{x}{ij} n{i+} (y_{ij}-\pi_{ij}) = \boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi}) $$

and the Hessian is

$$ \frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'} = \sum_{i,j} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}'} \frac{\partial\ell^2}{\partial\eta_{ij}^2} = - \sum_{i,j,k} \boldsymbol{x}{ij} n{i+} (\delta_{jk}-\pi_{ij}\pi_{ik}) \boldsymbol{x}_{ij}' = - \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} $$

Here $y_{ij}=n_{ij}/n_{i+}$, while $\boldsymbol{N}$ is a diagonal matrix with diagonal elements $n_{i+}$.

Newton-Raphson iterations then take the form

$$ \boldsymbol{\alpha}^{(s+1)} = \boldsymbol{\alpha}^{(s)} - \left( \frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'} \right)^{-1} \frac{\partial\ell}{\partial\boldsymbol{\alpha}} = \boldsymbol{\alpha}^{(s)} + \left( \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \right)^{-1} \boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi}) $$

where $\boldsymbol{\pi}$ and $\boldsymbol{W}$ are evaluated at $\boldsymbol{\alpha}=\boldsymbol{\alpha}^{(s)}$.

Multiplying by $\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}$ gives

$$ \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \boldsymbol{\alpha}^{(s+1)} = \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \boldsymbol{\alpha}^{(s)} + \boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi}) = \boldsymbol{X}'\boldsymbol{W} \left(\boldsymbol{X}\boldsymbol{\alpha}^{(s)}+\boldsymbol{W}^-\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})\right) = \boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^* $$

where $\boldsymbol{W}^-$ is a generalized inverse of $\boldsymbol{W}$ and $\boldsymbol{y}^*$ is a "working response vector" with elements

$$ y_{ij}^*=\boldsymbol{x}{ij}'\boldsymbol{\alpha}^{(s)}+\frac{y{ij}-\pi_{ij}}{\pi_{ij}} $$

The IWLS algorithm thus involves the following steps:

Create some suitable starting values for $\boldsymbol{\pi}$, $\boldsymbol{W}$, and $\boldsymbol{y}^*$
Construct the "working dependent variable" $\boldsymbol{y}^*$
Solve the equation

$$ \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \boldsymbol{\alpha} = \boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^* $$

for $\boldsymbol{\alpha}$.
Compute updated $\boldsymbol{\eta}$, $\boldsymbol{\pi}$, $\boldsymbol{W}$, and $\boldsymbol{y}^*$.
Compute the updated value for the log-likelihood or the deviance

$$ d=2\sum_{i,j}n_{ij}\ln\frac{y_{ij}}{\pi_{ij}} $$
If the decrease of the deviance (or the increase of the log-likelihood) is smaller than a given tolerance criterian (typically $\Delta d \leq 10^{-7}$) stop the algorighm and declare it as converged. Otherwise go back to step 2 with the updated value of $\boldsymbol{\alpha}$.

The starting values for the algorithm used by the mclogit package are constructe as follows:

Set

$$ \eta_{ij}^{(0)} = \ln (n_{ij}+\tfrac12) - \frac1{q_i}\sum_{k\in\mathcal{S}i}\ln (n{ij}+\tfrac12) $$

(where $q_i$ is the size of the choice set $\mathcal{S}_i$)
Compute the starting values of the choice probabilities $\pi_{ij}^{(0)}$ according to the equation at the beginning of the page
Compute intial values of the working dependent variable according to

$$ y_{ij}^{*(0)} = \eta_{ij}^{(0)}+\frac{y_{ij}-\pi_{ij}^{(0)}}{\pi_{ij}^{(0)}} $$