Introduction to RprobitB

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(RprobitB)

RprobitB is an R package that can be used to fit latent class mixed multinomial probit (LCMMNP) models to simulated or empirical data. RprobitB is licensed under the GNU General Public License v3.0.

Do you found a bug or request a feature? Please tell us!

The package name RprobitB is a portmanteau, combining R (the programming language), probit (the model name) and B (for Bayes, the estimation method).

RprobitB is able to

At this point, you may ask yourself:

  1. How to install RprobitB?
  2. How to use RprobitB?
  3. How is the LCMMNP model defined?
  4. How does the latent class updating scheme work?
  5. How to specify a LCMMNP model in RprobitB?

Installing RprobitB

To install the latest version of RprobitB, run install.packages("RprobitB") in your R console.

Using RprobitB

To use RprobitB, follow these steps:

  1. Specify the model, see below for details.
  2. Run RprobitB::rpb(<list of model specifications>).
  3. You get on-screen information and model results in an output folder.

LCMMNP model

Assume that we observe the choices of $N$ decision makers which decide between $J$ alternatives at each of $T$ choice occasions.[^1] Specific to each decision maker, alternative and choice occasion, we furthermore observe $P_f+P_r$ choice attributes that we use to explain the choices. The first $P_f$ attributes are connected to fixed coefficients, the other $P_r$ attributes to random coefficients following a joint distribution mixed across decision makers. Person $n$'s utility $\tilde{U}{ntj}$ for alternative $j$ at choice occasion $t$ is modelled as \begin{equation} \tilde{U}{ntj} = \tilde{W}{ntj}'\alpha + \tilde{X}{ntj}'\beta_n + \tilde{\epsilon}_{ntj} \end{equation} for $n=1,\dots,N$, $t=1,\dots,T$ and $j=1,\dots,J$, where

As is well known, any utility model needs to be normalized with respect to level and scale in order to be identified. Therefore, we consider the transformed model \begin{equation} U_{ntj} = W_{ntj}'\alpha + X_{ntj}'\beta_n + \epsilon_{ntj}, \end{equation} $n=1,\dots,N$, $t=1,\dots,T$ and $j=1,\dots,J-1$, where (choosing $J$ as the reference alternative) $U_{ntj}=\tilde{U}{ntj} - \tilde{U}{ntJ}$, $W_{ntj}=\tilde{W}{ntj}-\tilde{W}{ntJ}$, $X_{ntj}=\tilde{X}{ntj}-\tilde{X}{ntJ}$ and $\epsilon_{ntj}=\tilde{\epsilon}{ntj}-\tilde{\epsilon}{ntJ}$, where $(\epsilon_{nt:}) = (\epsilon_{nt1},...,\epsilon_{nt(J-1)})' \sim \text{MVN}_{J-1} (0,\Sigma)$ and $\Sigma$ denotes a covariance matrix with the top-left element restricted to one. While taking utility differences in order to normalize the model with respect to level is a standard procedure, alternatives to fixing an error term variance in order to normalize with respect to scale exist, for example fixing an element of $\alpha$.

Let $y_{nt}=j$ denote the event that decision maker $n$ chooses alternative $j$ at choice occasion $t$. Assuming utility maximizing behaviour of the decision makers, the decisions are linked to the utilities via \begin{equation} y_{nt} = \sum_{j=1}^{J-1} j\cdot 1 \left (U_{ntj}=\max_i U_{nti}>0 \right) + J \cdot 1\left (U_{ntj}<0 ~\text{for all}~j\right), \end{equation} where $1(A)$ equals $1$ if condition $A$ is true and $0$ else.

We approximate the mixing distribution $g_{P_r}$ for the random coefficients\footnote{Here and below we use the abbreviation $(\beta_n)n$ as a shortcut to $(\beta_n){n =1,...,N}$ the collection of vectors $\beta_n,n=1,...,N$.} $\beta=(\beta_n){n}$ by a mixture of $P_r$-variate normal densities $\phi{P_r}$ with mean vectors $b=(b_c){c}$ and covariance matrices $\Omega=(\Omega_c){c}$ using $C$ components, i.e. \begin{equation} \beta_n\mid b,\Omega \sim \sum_{c=1}^{C} s_c \phi_{P_r} (\cdot \mid b_c,\Omega_c), \end{equation} where $(s_c){c}$ are weights satisfying $0 < s_c\leq 1$ for $c=1,\dots,C$ and $\sum_c s_c=1$. One interpretation of the latent class model is obtained by introducing variables $z=(z_n)_n$ allocating each decision maker $n$ to class $c$ with probability $s_c$, i.e. \begin{equation} \text{Prob}(z_n=c)=s_c \quad \text{and} \quad \beta_n \mid z,b,\Omega \sim \phi{P_r}(\cdot \mid b_{z_n},\Omega_{z_n}). \end{equation} We call this model the latent class mixed multinomial probit (LCMMNP) model.[^2]

[^1]: For notational simplicity, the number of choice occasions $T$ is assumend to be the same for each decision maker here. However, RprobitB allows for a different number of choice occasions for each decision maker. [^2]: We note that the model collapses to the (normally) mixed multinomial probit model if $P_r>0$ and $C=1$ and to the basic multinomial probit model if $P_r=0$.

Latent class updating scheme

Updating the number $C$ of latent classes is done within the algorithm by executing the following weight-based updating scheme.

Specifying a LCMMNP model in RprobitB

RprobitB specifications are grouped in the named lists

You can either specify none, all, or selected parameters. Unspecified parameters are set to default values.

model

data

If data = NULL, data is simulated from the model defined by model and parm.

To model empirical data, specify

[^3]: The wide data format presents each different covariate in a separate column. See the vignette of the mlogit package for an example.

parm

lcus

init

prior

A priori, parm[["alpha"]] ~ Normal(eta,Psi) with

A priori, parm[["s"]] ~ Dirichlet(delta) with

A priori, parm[["b"]][,c] ~ Normal(xi,D) with

A priori, matrix(parm[["Omega"]][,c],nrow=model[["P_r"]],ncol=model[["P_r"]]) ~ Inverse_Wishart(nu,Theta) with

A priori, parm[["Sigma"]] ~ Inverse_Wishart(kappa,E) with

mcmc

norm

RprobitB automatically normalizes with respect to level by computing utility differences, where model[["J"]] is the base alternative. The normalization with respect to scale can be specified:

out

Default specifications of RprobitB

model

data

NULL

parm

Per default, parameters are randomly drawn.

lcus

init

prior

mcmc

norm

out

Example: Simulated data

The code below fits a mixed multinomial probit model with

to simulated data with

The number of latent classes is updated, because do_lcus = TRUE is set. The Gibbs sampler draws R = 20000 samples. By default, the model is named id = "test" and results are saved in rdir = "tempdir()" (the path of the per-session temporary directory).

set.seed(1)

### model specification
model = list("N" = 100, "T" = sample(10:20,100,replace=TRUE), "J" = 3, "P_f" = 1, "P_r" = 2, "C" = 2)
lcus  = list("do_lcus" = TRUE)
mcmc  = list("R" = 20000)

### start estimation (about 3 minutes computation time)
RprobitB::rpb("model" = model, "lcus" = lcus, "mcmc" = mcmc)

Example: Empirical data

The code below fits a mixed multinomial probit model with P_f = 2 fixed and P_r = 2 random coefficients to the "Train" dataset of the mlogit package with

For normalization, the price coefficient ("parameter" = "a", "index" = 1) is fixed to "value" = -1.

### load Train data
data("Train", package = "mlogit")

### model specification
model = list("P_f" = 2, "P_r" = 2)
data  = list("data_raw" = Train, "cov_col" = 4:11, "cov_ord" = c("price","comfort","time","change"), "cov_zst" = TRUE)
lcus  = list("do_lcus" = TRUE)
mcmc  = list("R" = 1e5)
norm  = list("parameter" = "a", "index" = 1, "value" = -1)
out   = list("id" = "train", "pp" = TRUE)

### start estimation (about 20 minutes computation time)
RprobitB::rpb("model" = model, "data" = data, "lcus" = lcus, "mcmc" = mcmc, "norm" = norm, "out" = out)

On-screen information

During estimation, you get the following on-screen information:

Model results

In the output folder out[["rdir"]]/out[["id"]], you can find the files

References



Try the RprobitB package in your browser

Any scripts or data that you put into this service are public.

RprobitB documentation built on May 25, 2021, 9:07 a.m.