knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "img/", fig.align = "center", fig.dim = c(8, 6), out.width = "75%" ) library("RprobitB") options("RprobitB_progress" = FALSE)
The task of model selection targets the question: If there are several competing models, how do I choose the most appropriate one? This vignette^[This vignette is built using R r paste(R.Version()$major, R.Version()$minor, sep = ".")
with the {RprobitB}
r utils::packageVersion("RprobitB")
package.] outlines the model selection tools implemented in {RprobitB}
.
For illustration, we revisit the probit model of travelers deciding between two fictional train route alternatives from the vignette on model fitting:
form <- choice ~ price + time + change + comfort | 0 data <- prepare_data(form = form, choice_data = train_choice, id = "deciderID", idc = "occasionID") model_train <- fit_model( data = data, scale = "price := -1", R = 1000, B = 500, Q = 10 )
As a competing model, we consider explaining the choices only by the alternative's price, i.e. the probit model with the formula choice ~ price | 0
:
model_train_sparse <- update(model_train, form = choice ~ price | 0)
The update()
function helps to estimate a new version of model_train
with new specifications. Here, only form
has been changed.
model_selection()
function{RprobitB}
provides the convenience function model_selection()
, which takes an arbitrary number of RprobitB_fit
objects and returns a matrix of model selection criteria:
model_selection(model_train, model_train_sparse)
Specifying criteria
is optional. Per default, criteria = c("npar", "LL", "AIC", "BIC")
.^[To show the model formulas in the output of model_selection()
, add the argument add_form = TRUE
.] The available model selection criteria are explained in the following.
npar
"npar"
yields the number of model parameters, which is computed by the npar()
method:
npar(model_train, model_train_sparse)
Here, model_train
has 4 parameters (a coefficient for price, time, change, and comfort, respectively), while model_train_sparse
has only a single price coefficient.
LL
If "LL"
is included in criteria
, model_selection()
returns the model's log-likelihood values. They can also be directly accessed via the logLik()
method:^[The log-likelihood values are per default computed at the point estimates derived from the Gibbs sample means. logLik()
has the argument par_set
, where alternative statistics for the point estimate can be specified.]
logLik(model_train) logLik(model_train_sparse)
AIC
Including "AIC"
yields the Akaike's Information Criterion [@Akaike1974], which is computed as $$-2 \cdot \text{LL} + k \cdot \text{npar},$$
where $\text{LL}$ is the model's log-likelihood value, $k$ is the penalty per parameter with $k = 2$ per default for the classical AIC, and $\text{npar}$ is the number of parameters in the fitted model.
Alternatively, the AIC()
method also returns the AIC values:
AIC(model_train, model_train_sparse, k = 2)
The AIC quantifies the trade-off between over- and under-fitting, where smaller values are preferred. Here, the increase in goodness of fit justifies the additional 3 parameters of model_train
.
BIC
Similar to the AIC, "BIC"
yields the Bayesian Information Criterion [@Schwarz1978], which is defined as
$$-2 \cdot \text{LL} + \log{(\text{nobs})} \cdot \text{npar},$$
where $\text{LL}$ is the model's log-likelihood value, $\text{nobs}$ is the number of data points (which can be accessed via the nobs()
method), and $\text{npar}$ is the number of parameters in the fitted model. The interpretation is analogue to AIC.
{RprobitB}
also provided a method for the BIC
value:
BIC(model_train, model_train_sparse)
WAIC
(with se(WAIC)
and pWAIC
)WAIC is short for Widely Applicable (or Watanabe-Akaike) Information Criterion [@Watanabe2010]. As for AIC and BIC, the smaller the WAIC value the better the model. Including "WAIC"
in criteria
yields the WAIC value, its standard error se(WAIC)
, and the effective number of parameters pWAIC
, see below.
The WAIC is defined as
$$-2 \cdot \text{lppd} + 2\cdot p_\text{WAIC},$$
where $\text{lppd}$ stands for log pointwise predictive density and $p_\text{WAIC}$ is a penalty term proportional to the variance in the posterior distribution that is sometimes called effective number of parameters, see @McElreath2020 p. 220 for a reference.
The $\text{lppd}$ is approximated as follows. Let $$p_{si} = \Pr(y_i\mid \theta_s)$$ be the probability of observation $y_i$ (here the single choices) given the $s$-th set $\theta_s$ of parameter samples from the posterior. Then
$$\text{lppd} = \sum_i \log \left( S^{-1} \sum_s p_{si} \right).$$ The penalty term is computed as the sum over the variances in log-probability for each observation: $$p_\text{WAIC} = \sum_i \mathbb{V}{\theta} \log (p{si}) . $$ The $\text{WAIC}$ has a standard error of $$\sqrt{n \cdot \mathbb{V}i \left[-2 \left(\text{lppd} - \mathbb{V}{\theta} \log (p_{si}) \right)\right]},$$ where $n$ is the number of choices.
Before computing the WAIC of an \code{RprobitB_fit} object, the probabilities $p_{si}$ must be computed via the compute_p_si()
function: ^[This computation has been outsourced because it is very time consuming. For example, the computation for model_train
took about 5 minutes. To decrease the computation time, the function offers parallelization via specifying the number ncores
of parallel CPU cores.]
model_train <- compute_p_si(model_train)
Afterwards, the WAIC can be accessed as follows:
WAIC(model_train) WAIC(model_train_sparse)
MMLL
"MMLL"
in criteria
stands for marginal model log-likelihood. The model's marginal likelihood $\Pr(y\mid M)$ for a model $M$ and data $y$ is required for the computation of Bayes factors, see below. In general, the term has no closed form and must be approximated numerically.
{RprobitB}
uses the posterior Gibbs samples derived from the mcmc()
function to approximate the model's marginal likelihood via the posterior harmonic mean estimator [@Newton1994]: Let $S$ denote the number of available posterior samples $\theta_1,\dots,\theta_S$. Then,
$$\Pr(y\mid M) = \left(\mathbb{E}_\text{posterior} 1/\Pr(y\mid \theta,M) \right)^{-1} \approx \left( \frac{1}{S} \sum_s 1/\Pr(y\mid \theta_s,M) \right) ^{-1} = \tilde{\Pr}(y\mid M).$$
By the law of large numbers, $\tilde{\Pr}(y\mid M) \to \Pr(y\mid M)$ almost surely as $S \to \infty$.
As for the WAIC, computing the MMLL relies on the probabilities $p_{si} = \Pr(y_i\mid \theta_s)$, which must first be computed via the compute_p_si()
function. Afterwards, the mml()
function can be called with an RprobitB_fit
object as input. The function returns the RprobitB_fit
object, where the marginal likelihood value is saved as the entry "mml"
and the marginal log-likelihood value as the attribute "mmll"
:^[Note that the marginal likelihood value is very small. The given representation is required so that the value is not rounded to 0 by the computer.]
model_train <- mml(model_train) model_train$mml attr(model_train$mml, "mmll")
There are two options for improving the approximation: As for the WAIC, you can use more posterior samples. Alternatively, you can combine the posterior harmonic mean estimate with the prior arithmetic mean estimator [@Hammersley1964]: For this approach, $S$ samples $\theta_1,\dots,\theta_S$ are drawn from the model's prior distribution. Then,
$$\Pr(y\mid M) = \mathbb{E}_\text{prior} \Pr(y\mid \theta,M) \approx \frac{1}{S} \sum_s \Pr(y\mid \theta_s,M) = \tilde{\Pr}(y\mid M).$$
Again, it hols by the law of large numbers, that $\tilde{\Pr}(y\mid M) \to \Pr(y\mid M)$ almost surely as $S \to \infty$. The final approximation of the model's marginal likelihood than is a weighted sum of the posterior harmonic mean estimate and the prior arithmetic mean estimate, where the weights are determined by the sample sizes.
To use the prior arithmetic mean estimator, call the mml()
function with a specification of the number of prior draws S
and set recompute = TRUE
:^[The mml()
function offers parallelization via specifying the number ncores
of parallel CPU cores.]
model_train <- mml(model_train, S = 1000, recompute = TRUE)
Note that the prior arithmetic mean estimator works well if the prior and posterior distribution have a similar shape and strong overlap, as @Gronau2017 points out. Otherwise, most of the sampled prior values result in a likelihood value close to zero, thereby contributing only marginally to the approximation. In this case, a very large number S
of prior samples is required.
BF
The Bayes factor is an index of relative posterior model plausibility of one model over another [@Marin2014]. Given data $\texttt{y}$ and two models $\texttt{mod0}$ and $\texttt{mod1}$, it is defined as
$$ BF(\texttt{mod0},\texttt{mod1}) = \frac{\Pr(\texttt{mod0} \mid \texttt{y})}{\Pr(\texttt{mod1} \mid \texttt{y})} = \frac{\Pr(\texttt{y} \mid \texttt{mod0} )}{\Pr(\texttt{y} \mid \texttt{mod1})} / \frac{\Pr(\texttt{mod0})}{\Pr(\texttt{mod1})}. $$
The ratio $\Pr(\texttt{mod0}) / \Pr(\texttt{mod1})$ expresses the factor by which $\texttt{mod0}$ a priori is assumed to be the correct model. Per default, {RprobitB}
sets $\Pr(\texttt{mod0}) = \Pr(\texttt{mod1}) = 0.5$. The front part $\Pr(\texttt{y} \mid \texttt{mod0} ) / \Pr(\texttt{y} \mid \texttt{mod1})$ is the ratio of the marginal model likelihoods. A value of $BF(\texttt{mod0},\texttt{mod1}) > 1$ means that the model $\texttt{mod0}$ is more strongly supported by the data under consideration than $\texttt{mod1}$.
Adding "BF"
to the criteria
argument of model_selection
yields the Bayes factors. We find a decisive evidence [@Jeffreys1998] in favor of model_train
.
model_selection(model_train, model_train_sparse, criteria = c("BF"))
pred_acc
Finally, adding "pred_acc"
to the criteria
argument for the model_selection()
function returns the share of correctly predicted choices. From the output of model_selection()
above (or alternatively the one in the following) we deduce that model_train
correctly predicts about 6% of the choices more than model_train_sparse
:^[See the vignette on choice prediction for more nuanced performance comparison in terms of sensitivity and specificity.]
pred_acc(model_train, model_train_sparse)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.