Model fitting"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "",
  fig.align = "center",
  fig.dim = c(8, 6), 
  out.width = "75%"
)
library("RprobitB")
options("RprobitB_progress" = FALSE)

This vignette^[This vignette is built using R r paste(R.Version()$major, R.Version()$minor, sep = ".") with the {RprobitB} r utils::packageVersion("RprobitB") package.] is a documentation of the estimation procedure fit_model() in {RprobitB}.

Bayes estimation of the probit model

Bayes estimation of the probit model builds upon the work of @McCulloch:1994, @Nobile:1998, @Allenby:1998, and @Imai:2005. A key ingredient is the concept of data augmentation, see @Albert:1993: The idea is to treat the latent utilities $U$ in the model equation $U = X\beta + \epsilon$ as additional parameters. Then, conditional on $U$, the probit model constitutes a standard Bayesian linear regression set-up. Its posterior distribution can be approximated by iteratively drawing and updating each model parameter conditional on the other parameters (the so-called Gibbs sampling approach).

A priori, we assume the following (conjugate) parameter distributions:

These prior distributions imply the following conditional posterior distributions:

Parameter normalization

Samples obtained from the updating scheme described above lack identification (except for $s$ and $z$ draws), compare to the vignette on the model definition. Therefore, subsequent to the sampling, the following normalizations are required for the $i$th updates in each iterations $i$:

where either $\omega^{(i)} = \sqrt{\text{const} / (\Sigma^{(i)}){jj}}$ with $(\Sigma^{(i)}){jj}$ the $j$th diagonal element of $\Sigma^{(i)}$, $1\leq j \leq J-1$, or alternatively $\omega^{(i)} = \text{const} / \alpha^{(i)}_p$ for some coordinate $1\leq p \leq P_f$ of the $i$th draw for the coefficient vector $\alpha$. Here, $\text{const}$ is any positive constant (typically 1). The preferences will be flipped if $\omega^{(i)} < 0$, which only is the case if $\alpha^{(i)}_p < 0$.

Burn-in and thinning

The theory behind Gibbs sampling constitutes that the sequence of samples produced by the updating scheme is a Markov chain with stationary distribution equal to the desired joint posterior distribution. It takes a certain number of iterations for that stationary distribution to be approximated reasonably well. Therefore, it is common practice to discard the first $B$ out of $R$ samples (the so-called burn-in period). Furthermore, correlation between nearby samples should be expected. In order to obtain independent samples, we consider only every $Q$th sample when computing Gibbs sample statistics like expectation and standard deviation. The independence of the samples can be verified by computing the serial correlation and the convergence of the Gibbs sampler can be checked by considering trace plots, see below.

The fit_model() function

The Gibbs sampling scheme described above can be executed by applying the function

fit_model(data = data)

where data must be an RprobitB_data object (see the vignette about choice data). The function has the following optional arguments:

Example

In the previous vignette on choice data, we introduced the Train data set from the {mlogit} package [@Croissant:2020] that contains 2922 choices between two fictional train route alternatives. First, we transform the travel time from minutes to hours and the travel price from guilders to euros:

data("Train", package = "mlogit")
Train$price_A <- Train$price_A / 100 * 2.20371
Train$price_B <- Train$price_B / 100 * 2.20371
Train$time_A <- Train$time_A / 60
Train$time_B <- Train$time_B / 60

The following lines fit a probit model that explains the chosen trip alternatives (choice) by their price, time, number of changes, and level of comfort (the lower this value the higher the comfort). For normalization, the first linear coefficient, the price, was fixed to -1, which allows to interpret the other coefficients as monetary values:

set.seed(1)
form <- choice ~ price + time + change + comfort | 0
data <- prepare_data(form = form, choice_data = Train)
model_train <- fit_model(
  data = data,
  scale = "price := -1"
)

The estimated coefficients (using the mean of the Gibbs samples as a point estimate) can be printed via

coef(model_train)

and visualized via

plot(coef(model_train), sd = 3)

The results indicate that the deciders value one hour travel time by about 25€, an additional change by 5€, and a more comfortable class by 14€.^[These results are consistent with the ones that are presented in a vignette of the mlogit package on the same data set but using the logit model.]

Checking the Gibbs samples

The Gibbs samples are saved in list form in the RprobitB_fit object at the entry "gibbs_samples", i.e.

str(model_train$gibbs_samples, max.level = 2, give.attr = FALSE)

This object contains 2 elements:

Calling the summary function on the estimated RprobitB_fit object yields additional information about the Gibbs samples gibbs_samples_nbt. You can specify a list FUN of functions that compute any point estimate of the Gibbs samples^[Use the function point_estimates() to access the Gibbs sample statistics as an RprobitB_parameter object.], for example

summary(model_train, 
        FUN = c("mean"        = mean, 
                "sd"          = stats::sd, 
                "R^"          = R_hat,
                "custom_stat" = function(x) abs(mean(x) - median(x))
                )
       )

Calling the plot method with the additional argument type = "trace" plots the trace of the Gibbs samples gibbs_samples_nbt:

par(mfrow = c(2,1))
plot(model_train, type = "trace")

Additionally, we can visualize the serial correlation of the Gibbs samples via the argument type = "acf". The boxes in the top-right corner state the total sample size TSS (here R - B = 10000 - 5000 = 5000), the effective sample size ESS, and the factor by which TSS is larger than ESS.

par(mfrow = c(2,3))
plot(model_train, type = "acf")

Here, the effective sample size is the value $\text{TSS} / (1 + \sum_{k\geq 1} \rho_k)$, where $\rho_k$ is the auto correlation between the chain offset by $k$ positions. The auto correlations are estimated via the stats::acf() function.

Model transformation after estimation

The transform method can be used to transform an RprobitB_fit object in three ways:

  1. change the length B of the burn-in period, for example
model_train <- transform(model_train, B = 1)
  1. change the thinning factor Q of the Gibbs samples, for example
model_train <- transform(model_train, Q = 100)
  1. or change the model normalization scale, for example
model_train <- transform(model_train, scale = "Sigma_1 := 1")

References



Try the RprobitB package in your browser

Any scripts or data that you put into this service are public.

RprobitB documentation built on Nov. 10, 2022, 5:12 p.m.