# bbreg: bbreg In bbreg: Bessel and Beta Regressions via Expectation-Maximization Algorithm for Continuous Bounded Data

## Description

Function to fit, via Expectation-Maximization (EM) algorithm, the bessel or the beta regression to a given data set with a bounded continuous response variable.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ```bbreg( formula, data, link.mean = c("logit", "probit", "cauchit", "cloglog"), link.precision = c("identity", "log", "sqrt", "inverse"), model = NULL, residual = NULL, envelope = 0, prob = 0.95, predict = 0, ptest = 0.25, epsilon = 10^(-5) ) ```

## Arguments

 `formula` symbolic description of the model (examples: `z ~ x1 + x2` and `z ~ x1 + x2 | v1 + v2`); see details below. `data` elements expressed in formula. This is usually a data frame composed by: (i) the bounded continuous observations in `z` (0 < z_i < 1), (ii) covariates for the mean submodel (columns `x1` and `x2`) and (iii) covariates for the precision submodel (columns `v1` and `v2`). `link.mean` optionally, a string containing the link function for the mean. If omitted, the 'logit' link function will be used. The possible link functions for the mean are "logit","probit", "cauchit", "cloglog". `link.precision` optionally, a string containing the link function the precision parameter. If omitted and the only precision covariate is the intercept, the identity link function will be used, if omitted and there is a precision covariate other than the intercept, the 'log' link function will be used. The possible link functions for the precision parameter are "identity", "log", "sqrt", "inverse". `model` character ("bessel" or "beta") indicating the type of model to be fitted. The default is NULL, meaning that a discrimination test must be applied to select the model. `residual` character indicating the type of residual to be evaluated ("pearson", "score" or "quantile"). The default is "pearson". `envelope` number of simulations (synthetic data sets) to build envelopes for residuals (with `100*prob%` confidence level). The default `envelope = 0` dismisses the envelope analysis. `prob` probability indicating the confidence level for the envelopes (default: `prob` = 0.95). If `envelope` = 0, `prob` is ignored. `predict` number of partitions (training set to fit the model and a test set to calculate residuals) to be evaluated in a predictive accuracy study related to the `RSS_pred` (residual sum of squares for the partition "test set"). The default `predict` = 0 dismisses the `RSS_pred` analysis. The partitions are randomly defined when predict is set as a positive integer. `ptest` proportion of the sample size to be considered in the test set for the `RSS_pred` analysis (default = 0.25 = 25\ If predict = 0, ptest is ignored. `epsilon` tolerance value to control the convergence criterion in the EM-algorithm (default = 10^(-5)).

## Details

The bessel regression originates from a class of normalized inverse-Gaussian (N-IG) process introduced in Lijoi et al. (2005) as an alternative to the widely used Dirichlet process in the Bayesian context. These authors consider a ratio of inverse-Gaussian random variables to define the new process. In the particular univariate case, the N-IG is obtained from the representation "Z = Y1/(Y1+Y2)", with "Y1" and "Y2" being independent inverse-Gaussian random variables having scale = 1 and shape parameters "a1 > 0" and "a2 > 0", respectively. Denote "Y1 ~ IG(a1)" and "Y2 ~ IG(a2)". The density of "Z" has support in the interval (0,1) and it depends on the modified Bessel function of third kind with order 1, named here as "K1(-)". The presence of "K1(-)" in the structure of the p.d.f. establishes the name of the new distribution; consider Z ~ Bessel(a1,a2). Note that the name "beta distribution" is also an analogy to the presence of a function (the beta function) in its p.d.f. structure. The bessel regression model is defined by assuming "Z_1,...,Z_n" as a random sample of continuous bounded responses with "Z_i ~ Bessel(mu_i,phi_i)" for "i = 1,...,n". Using this parameterization, one can write: "E(Z_i) = mu_i" and "Var(Z_i) = mu_i(1-mu_i) g(phi_i)", where "g(-)" is a function depending on the exponential integral of "phi_i". The following link functions are assumed "logit(mu_i) = x_i^T kappa" and "log(phi_i) = v_i^T lambda", where "kappa' = (kappa_1,...,kappa_p)" and "lambda' = (lambda_1,...,lambdaq)" are real valued vectors. The terms "x_i^T" and "v_i^T" represent, respectively, the i-th row of the matrices "x" (nxp) and "v" (nxq) containing covariates in their columns ("x_i,1" and "v_i,1" may be 1 to handle intercepts). As it can be seen, this regression model has two levels with covariates explaining the mean "mu_i" and the parameter "phi_i". For more details about the bessel regression see Barreto-Souza, Mayrink and Simas (2020).

This package implements an Expectation Maximization (EM) algorithm to fit the bessel regression. The full EM approach proposed in Barreto-Souza and Simas (2017) for the beta regression is also available here. Fitting the beta regression via EM-algorithm is a major difference between the present package bbreg and the well known `betareg` created by Alexandre B. Simas and currently maintained by Achim Zeileis. The estimation procedure on the betareg packages is given by maximizing the beta model likelihood via `optim`. In terms of initial values, bbreg uses quasi-likelihood estimates as the starting points for the EM-algorithms. The formulation of the target model also has the same structure as in the standard functions `lm`, `glm` and betareg, with also the same structure as the latter when precision covariates are being used. The user is supposed to write a formula object describing elements of the regression (response, covariates for the mean submodel, covariates for the precision submodel, presence of intercepts, and interactions). As an example, the description "z ~ x" indicates: "response = z" (continuous and bounded by 0 and 1), "covariates = columns of x" (mean submodel) and precision submodel having only an intercept. On the other hand, the configuration "z ~ x | v" establishes that the covariates given in the columns of "v" must be used in the precision submodel. Intercepts may be removed by setting "z ~ 0 + x | 0 + v" or "z ~ x - 1|v - 1". Absence of intercept and covariates is not allowed in any submodel. The type of model to be fitted ("bessel" or "beta") can be specified through the argument "model" of bbreg. If the user does not specify the model, the package will automatically apply a discrimination test (DBB - Discrimination between Bessel and Beta), developed in Barreto-Souza, Mayrink and Simas (2020), to select the most appropriate model for the given data set. In this case, some quantities related to the DBB are included in the final output; they are: "sum(Z2/n)" = mean of z_i^2, "sum(quasi_mu)" = sum (for i = 1,...,n) of muq_i + muq_i(1-muq_i)/2, with muq_i being the quasi-likelihood estimator of mu_i and, finally, the quantities "|D_bessel|" and "|D_beta|" depending on muq_i and the EM-estimates of phi_i under bessel or beta.

In the current version, three types of residuals are available for analysis ("Pearson", "Score" and "Quantile"). The user may choose one of them via the argument "residual". The score residual is computed empirically, based on 100 artificial data sets generated from the fitted model. The sample size is the same of the original data and the simulations are used to estimate the mean and s.d. required in the score residual formulation. The user may also choose to build envelopes for the residuals with confidence level in "prob". This feature also requires simulations of synthetic data ("envelope" is the number of replications). Residuals are obtained for each data set and confronted against the quantiles of the N(0,1). Predictive accuracy of the fitted model is also explored by setting "predict" as a positive integer (this value represents the number of random partitions to be evaluated). In this case, the full data set is separated in a training (partition to fit the model) and a test set (to evaluate residuals) for which the RSS (Residual Sum of Squares) is computed. The default partition is 75\ proportion `ptest` for the test set (large `ptest` is not recommended).

## Value

Object of class bbreg containing the outputs from the model fit (bessel or beta regression).

## References

arXiv:2003.05157 (Barreto-Souza, Mayrink and Simas; 2020)

DOI:10.1080/00949655.2017.1350679 (Barreto-Souza and Simas; 2017)

DOI:10.18637/jss.v034.i02 (Cribari-Neto and Zeileis; 2010)

DOI:10.1198/016214505000000132 (Lijoi et al.; 2005)

`summary.bbreg`, `plot.bbreg`, `simdata_bes`, `dbessel`, `dbbtest`, `simdata_bet`, `Formula`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25``` ```# Example with artificial data. n = 100; x = cbind(rbinom(n, 1, 0.5), runif(n, -1, 1)); v = runif(n, -1, 1); z = simdata_bes(kap = c(1, -1, 0.5), lam = c(0.5, -0.5), x, v, repetition = 1, link.mean = "logit", link.precision = "log") z = unlist(z) fit1 = bbreg(z ~ x | v) summary(fit1) plot(fit1) # Examples using the Weather Task (WT) data available in bbreg. fit2 = bbreg(agreement ~ priming + eliciting, data = WT) summary(fit2) fit3 = bbreg(agreement ~ priming + eliciting, envelope = 30, predict = 10, data = WT) summary(fit3) # Example with precision covariates fit4 = bbreg(agreement ~ priming + eliciting|eliciting, data = WT) summary(fit4) # Example with different link functions: fit5 = bbreg(agreement ~ priming + eliciting|eliciting, data = WT, link.mean = "cloglog", link.precision = "sqrt") summary(fit5) ```