6. Multinomial probit

knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, widtht = 65)
options(width = 65)

The model

The multinomial probit is obtained with the same modeling that we used while presenting the random utility model. The utility of an alternative is still the sum of two components : $U_j = V_j + \epsilon_j$.

but the joint distribution of the error terms is now a multivariate normal with mean 0 and with a matrix of covariance denoted $\Omega$^[see [@HAUS:WISE:78] and [@DAGA:79]].

Alternative $l$ is chosen if : $$ \left{ \begin{array}{rcl} U_1-U_l&=&(V_1-V_l)+(\epsilon_1-\epsilon_l)<0\ U_2-U_l&=&(V_2-V_l)+(\epsilon_2-\epsilon_l)<0\ & \vdots & \ U_J-U_l&=&(V_J-V_l)+(\epsilon_J-\epsilon_l)<0\ \end{array} \right. $$

wich implies, denoting $V^l_j=V_j-V_l$ :

$$ \left{ \begin{array}{rclrcl} \epsilon^l_1 &=& (\epsilon_1-\epsilon_l) &<& - V^l_1\ \epsilon^l_2 &=& (\epsilon_2-\epsilon_l) &<& - V^l_2\ &\vdots & & \vdots & \ \epsilon^l_J &=& (\epsilon_J-\epsilon_l) &<& - V^l_J\ \end{array} \right. $$

The initial vector of errors $\epsilon$ are transformed using the following transformation :

$$\epsilon^l = M^l \epsilon$$

where the transformation matrix $M^l$ is a $(J-1) \times J$ matrix obtained by inserting in an identity matrix a $l^{\mbox{th}}$ column of $-1$. For example, if $J = 4$ and $l = 3$ :

$$ M^3 = \left( \begin{array}{cccc} 1 & 0 & -1 & 0 \ 0 & 1 & -1 & 0 \ 0 & 0 & -1 & 1 \ \end{array} \right) $$

The covariance matrix of the error differences is obtained using the following matrix :

$$ \mbox{V}\left(\epsilon^l\right)=\mbox{V}\left(M^l\epsilon\right) = M^l\mbox{V}\left(\epsilon\right){M^l}^{\top} = M^l\Omega{M^l}^{\top} $$

The probability of choosing $l$ is then :

\begin{equation} P_l =\mbox{P}(\epsilon^l_1<-V_1^l \;\&\; \epsilon^l_2<-V_2^l \;\&\; ... \; \epsilon^l_J<-V_J^l) \end{equation}

with the hypothesis of distribution, this writes :

\begin{equation} P_l = \int_{-\infty}^{-V_1^l}\int_{-\infty}^{-V_2^l}...\int_{-\infty}^{-V_J^l}\phi(\epsilon^l) d\epsilon^l_1 d\epsilon^l_2... d^l_J \end{equation}

with :

\begin{equation} \phi\left(\epsilon^l\right)=\frac{1}{(2\pi)^{(J-1)/2}\mid\Omega^l\mid^{1/2}} e^{-\frac{1}{2}\epsilon^l{\Omega^l}^{-1}\epsilon^l} \end{equation}

Two problems arise with this model :

Identification

The meaning-full parameters are those of the covariance matrix of the error $\Omega$. For example, with $J = 3$ :

$$ \Omega = \left( \begin{array}{ccc} \sigma_{11} & \sigma_{12} & \sigma_{13} \ \sigma_{21} & \sigma_{22} & \sigma_{23} \ \sigma_{31} & \sigma_{32} & \sigma_{33} \ \end{array} \right) $$

$$ \Omega^1 = M^1 \Omega {M^1}^{\top}= \left( \begin{array}{cc} \sigma_{11}+\sigma_{22}-2\sigma_{12} & \sigma_{11} + \sigma_{23} - \sigma_{12} -\sigma_{13} \ \sigma_{11}+\sigma_{23}- \sigma_{12} - \sigma_{13} & \sigma_{11} + \sigma_{33} - 2 \sigma_{13} \ \end{array} \right) $$

The overall scale of utility being unidentified, one has to impose the value of one of the variance, for example the first one is fixed to 1. We then have :

$$ \Omega^1 = \left( \begin{array}{cc} 1 & \frac{\sigma_{11}+ \sigma_{23} - \sigma_{12} -\sigma_{13}}{\sigma_{11}+\sigma_{22}-2\sigma_{12}} \ \frac{\sigma_{11}+\sigma_{23}- \sigma_{12} - \sigma_{13}}{\sigma_{11}+\sigma_{22}-2\sigma_{12}} & \frac{\sigma_{11} + \sigma_{33} - 2 \sigma_{13}}{\sigma_{11}+\sigma_{22}-2\sigma_{12}} \ \end{array} \right) $$

Therefore, out the 6 structural parameters of the covariance matrix, only 3 can be identified. Moreover, it's almost impossible to interpret these parameters.

More generally, with $J$ alternatives, the number of the parameters of the covariance matrix is $(J+1)\times J/2$ and the number of identified parameters is $J\times(J-1)/2-1$.

Simulations

Let $L^l$ be the Choleski decomposition of the covariance matrix of the error differences :

$$ \Omega^l=L^l {L^l}^{\top} $$

This matrix is a lower triangular matrix of dimension $(J-1)$ :

$$ L^l= \left( \begin{array}{ccccc} l_{11} & 0 & 0 &... & 0 \ l_{21} & l_{22} & 0 & ... & 0 \ l_{31} & l_{32} & l_{33} & ... & 0 \ \vdots & \vdots & \vdots & \ddots & \vdots \ l_{(J-1)1} & l_{(J-1)2} & l_{(J-1)3} & ... & l_{(J-1)(J-1)} \ \end{array} \right) $$

Let $\eta$ be a vector of standard normal deviates :

$$ \eta \sim N(0, I) $$

Therefore, we have :

$$ \mbox{V}\left(L^l\eta\right)=L^lV(\eta){L^l}^{\top}=L^lI{L^l}^{\top}=\Omega^l $$

Therefore, if we draw a vector of standard normal deviates $\eta$ and apply to it this transformation, we get a realization of $\epsilon^l$.

This joint probability can be written as a product of conditional and marginal probabilities :

$$ \begin{array}{rcl} P_l &=& \mbox{P}(\epsilon^l_1<- V_1^l \;\&\; \epsilon^l_2<-V_2^l \;\&\; ... \;\&\; \epsilon^l_J<-V_J^l))\ &=& \mbox{P}(\epsilon^l_1<- V_1^l))\ &\times&\mbox{P}(\epsilon^l_2<-V_2^l \mid \epsilon^l_1<-V_1^l) \ &\times&\mbox{P}(\epsilon^l_3<-V_3^l \mid \epsilon^l_1<-V_1^l \;\&\; \epsilon^l_2<-V_2^l) \ & \vdots & \ &\times&\mbox{P}(\epsilon^l_J<-V_J^l \mid \epsilon^l_1<-V_1^l \;\&\; ... \;\&\; \epsilon^l_{J-1}<-V_{J-1}^l)) \ \end{array} $$

The vector of error differences deviates is :

$$ \left( \begin{array}{c} \epsilon^l_1 \ \epsilon^l_2 \ \epsilon^l_3 \ \vdots \ \epsilon^l_J \end{array} \right) = \left( \begin{array}{ccccc} l_{11} & 0 & 0 &... & 0 \ l_{21} & l_{22} & 0 & ... & 0 \ l_{31} & l_{32} & l_{33} & ... & 0 \ \vdots & \vdots & \vdots & \ddots & \vdots \ l_{(J-1)1} & l_{(J-1)2} & l_{(J-1)3} & ... & l_{(J-1)(J-1)} \ \end{array} \right) \times \left( \begin{array}{c} \eta_1 \ \eta_2 \ \eta_3 \ \vdots \ \eta_J \end{array} \right) $$

$$ \left( \begin{array}{c} \epsilon^l_1 \ \epsilon^l_2 \ \epsilon^l_3 \ \vdots \ \epsilon^l_J \end{array} \right) = \left( \begin{array}{l} l_{11}\eta_1 \ l_{21}\eta_1+l_{22}\eta_2 \ l_{31}\eta_1+l_{32}\eta_2 + l_{33}\eta_3\ \vdots \ l_{(J-1)1}\eta_1+l_{(J-1)2}\eta_2+...+l_{(J-1)(J-1)}\eta_{J-1} \end{array} \right) $$

Let's now investigate the marginal and conditional probabilities :

This probabilities can easily be simulated by drawing numbers from a truncated normal distribution.

This so called GHK algorithm^[see for example [@GEWE:KEAN:RUNK:94].] (for Geweke, Hajivassiliou and Keane who developed this algorithm) can be described as follow :

\begin{enumerate} - compute $\Phi\left(-\frac{V_1^l}{l_{11}}\right)$ - draw a number called $\eta_1^r$ from a standard normal distribution upper-truncated at $-\frac{V_1^l}{l_{11}}$ and compute $\Phi\left(-\frac{V^l_2+l_{21}\eta_1^r}{l_{22}}\right)$ - draw a number called $\eta_2^r$ from a standard normal distribution upper-truncated at $-\frac{V^l_2+l_{21}\eta_1^r}{l_{22}}$ and compute $\Phi\left(-\frac{V^l_3+l_{31}\eta_1^r+l_{32}\eta_2^r}{l_{33}}\right)$ - $...$ draw a number called $\eta_{J-1}^r$ from a standard normal distribution upper-truncated at $-\frac{V^l_{J-1}+l_{(J-1)1}\eta_1^r+... V^l_{J-1}+l_{(J-1)(J-2)}\eta_{J-2}^r}{l_{(J-1)(J-1)}}$ - multiply all these probabilities and get a realization of the probability called $P^r_l$. - repeat all these steps many times and average all these probabilities ; this average is an estimation of the probability : $\bar{P}l = \sum{r=1}^R P^r_l/R$. \end{enumerate}

Several points should be noted concerning this algorithm :

Applications

We use again the Fishing data frame, with only a subset of three alternatives used. The multinomial probit model is estimated using mlogit with the probit argument equal to TRUE.

library("mlogit")
data("Fishing", package = "mlogit")
Fish <- dfidx(Fishing, varying = 2:9, choice = "mode", idnames = c("chid", "alt"))
Fish.mprobit <- mlogit(mode~price | income | catch, Fish, probit = TRUE, alt.subset=c('beach', 'boat','pier'))
summary(Fish.mprobit)

Bibliography



Try the mlogit package in your browser

Any scripts or data that you put into this service are public.

mlogit documentation built on Oct. 23, 2020, 5:29 p.m.