bbreg: Bessel and Beta Regression Models

Description Usage Arguments Details Value References See Also Examples

View source: R/1_bbreg.R

Description

Function to fit, via Expectation-Maximization (EM) algorithm, the bessel or the beta regression to a given data set with a bounded continuous response variable.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
bbreg(
  formula,
  data,
  link.mean = c("logit", "probit", "cauchit", "cloglog"),
  link.precision = c("identity", "log", "sqrt"),
  model = NULL,
  residual = NULL,
  envelope = 0,
  prob = 0.95,
  predict = 0,
  ptest = 0.25,
  em_controls = list(maxit = 5000, em_tol = 10^(-5)),
  optim_method = "L-BFGS-B",
  optim_controls = list()
)

bbreg.fit(
  x,
  z,
  v = NULL,
  link.mean = c("logit", "probit", "cauchit", "cloglog"),
  link.precision = c("identity", "log", "sqrt"),
  model = NULL,
  residual = NULL,
  envelope = 0,
  prob = 0.95,
  predict = 0,
  ptest = 0.25,
  em_controls = list(maxit = 5000, em_tol = 10^(-5)),
  optim_method = "L-BFGS-B",
  optim_controls = list()
)

Arguments

formula

symbolic description of the model (examples: z ~ x1 + x2 and z ~ x1 + x2 | v1 + v2); see details below.

data

elements expressed in formula. This is usually a data frame composed by: (i) the bounded continuous observations in z (0 < z_i < 1), (ii) covariates for the mean submodel (columns x1 and x2) and (iii) covariates for the precision submodel (columns v1 and v2).

link.mean

optionally, a string containing the link function for the mean. If omitted, the 'logit' link function will be used. The possible link functions for the mean are "logit","probit", "cauchit", "cloglog".

link.precision

optionally, a string containing the link function the precision parameter. If omitted and the only precision covariate is the intercept, the 'identity' link function will be used, if omitted and there is a precision covariate other than the intercept, the 'log' link function will be used. The possible link functions for the precision parameter are "identity", "log", "sqrt".

model

character ("bessel" or "beta") indicating the type of model to be fitted. The default is NULL, meaning that a discrimination test must be applied to select the model.

residual

character indicating the type of residual to be evaluated ("pearson", "score" or "quantile"). The default is "pearson".

envelope

number of simulations (synthetic data sets) to build envelopes for residuals (with 100*prob% confidence level). The default envelope = 0 dismisses the envelope analysis.

prob

probability indicating the confidence level for the envelopes (default: prob = 0.95). If envelope = 0, prob is ignored.

predict

number of partitions (training set to fit the model and a test set to calculate residuals) to be evaluated in a predictive accuracy study related to the RSS_pred (residual sum of squares for the partition "test set"). The default predict = 0 dismisses the RSS_pred analysis. The partitions are randomly defined when predict is set as a positive integer.

ptest

proportion of the sample size to be considered in the test set for the RSS_pred analysis (default = 0.25 = 25\ If predict = 0, ptest is ignored.

em_controls

a list containing two elements: maxit that contains the maximum number of iterations of the EM algorithm, the default is set to 5000; em_tol that defines the tolerance value to control the convergence criterion in the EM-algorithm, the default is set to 10^(-5).

optim_method

main optimization algorithm to be used. The available methods are the same as those of optim function. The default is set to "L-BFGS-B".

optim_controls

a list of control arguments to be passed to the optim function in the optimization of the model. For the control options, see the 'Details' in the help of optim for the possible arguments.

x

matrix of covariates with respect to the mean with dimension (n,nkap).

z

vector of response variables with length n. Each coordinate must belong to the standard unit interval (0,1).

v

matrix of covariates with respect to the precision parameter. The default is NULL. If not NULL must be of dimension (n,nlam).

Details

The bessel regression originates from a class of normalized inverse-Gaussian (N-IG) process introduced in Lijoi et al. (2005) as an alternative to the widely used Dirichlet process in the Bayesian context. These authors consider a ratio of inverse-Gaussian random variables to define the new process. In the particular univariate case, the N-IG is obtained from the representation "Z = Y1/(Y1+Y2)", with "Y1" and "Y2" being independent inverse-Gaussian random variables having scale = 1 and shape parameters "a1 > 0" and "a2 > 0", respectively. Denote "Y1 ~ IG(a1)" and "Y2 ~ IG(a2)". The density of "Z" has support in the interval (0,1) and it depends on the modified Bessel function of third kind with order 1, named here as "K1(-)". The presence of "K1(-)" in the structure of the p.d.f. establishes the name of the new distribution; consider Z ~ Bessel(a1,a2). Note that the name "beta distribution" is also an analogy to the presence of a function (the beta function) in its p.d.f. structure. The bessel regression model is defined by assuming "Z_1,...,Z_n" as a random sample of continuous bounded responses with "Z_i ~ Bessel(mu_i,phi_i)" for "i = 1,...,n". Using this parameterization, one can write: "E(Z_i) = mu_i" and "Var(Z_i) = mu_i(1-mu_i) g(phi_i)", where "g(-)" is a function depending on the exponential integral of "phi_i". The following link functions are assumed "logit(mu_i) = x_i^T kappa" and "log(phi_i) = v_i^T lambda", where "kappa' = (kappa_1,...,kappa_p)" and "lambda' = (lambda_1,...,lambda_q)" are real valued vectors. The terms "x_i^T" and "v_i^T" represent, respectively, the i-th row of the matrices "x" (nxp) and "v" (nxq) containing covariates in their columns ("x_i,1" and "v_i,1" may be 1 to handle intercepts). As it can be seen, this regression model has two levels with covariates explaining the mean "mu_i" and the parameter "phi_i". For more details about the bessel regression see Barreto-Souza, Mayrink and Simas (2020).

This package implements an Expectation Maximization (EM) algorithm to fit the bessel regression. The full EM approach proposed in Barreto-Souza and Simas (2017) for the beta regression is also available here. Fitting the beta regression via EM-algorithm is a major difference between the present package bbreg and the well known betareg created by Alexandre B. Simas and currently maintained by Achim Zeileis. The estimation procedure on the betareg packages is given by maximizing the beta model likelihood via optim. In terms of initial values, bbreg uses quasi-likelihood estimates as the starting points for the EM-algorithms. The formulation of the target model also has the same structure as in the standard functions lm, glm and betareg, with also the same structure as the latter when precision covariates are being used. The user is supposed to write a formula object describing elements of the regression (response, covariates for the mean submodel, covariates for the precision submodel, presence of intercepts, and interactions). As an example, the description "z ~ x" indicates: "response = z" (continuous and bounded by 0 and 1), "covariates = columns of x" (mean submodel) and precision submodel having only an intercept. On the other hand, the configuration "z ~ x | v" establishes that the covariates given in the columns of "v" must be used in the precision submodel. Intercepts may be removed by setting "z ~ 0 + x | 0 + v" or "z ~ x - 1|v - 1". Absence of intercept and covariates is not allowed in any submodel. The type of model to be fitted ("bessel" or "beta") can be specified through the argument "model" of bbreg. If the user does not specify the model, the package will automatically apply a discrimination test (DBB - Discrimination between Bessel and Beta), developed in Barreto-Souza, Mayrink and Simas (2020), to select the most appropriate model for the given data set. In this case, some quantities related to the DBB are included in the final output; they are: "sum(Z2/n)" = mean of z_i^2, "sum(quasi_mu)" = sum (for i = 1,...,n) of muq_i + muq_i(1-muq_i)/2, with muq_i being the quasi-likelihood estimator of mu_i and, finally, the quantities "|D_bessel|" and "|D_beta|" depending on muq_i and the EM-estimates of phi_i under bessel or beta.

In the current version, three types of residuals are available for analysis ("Pearson", "Score" and "Quantile"). The user may choose one of them via the argument "residual". The score residual is computed empirically, based on 100 artificial data sets generated from the fitted model. The sample size is the same of the original data and the simulations are used to estimate the mean and s.d. required in the score residual formulation. The user may also choose to build envelopes for the residuals with confidence level in "prob". This feature also requires simulations of synthetic data ("envelope" is the number of replications). Residuals are obtained for each data set and confronted against the quantiles of the N(0,1). Predictive accuracy of the fitted model is also explored by setting "predict" as a positive integer (this value represents the number of random partitions to be evaluated). In this case, the full data set is separated in a training (partition to fit the model) and a test set (to evaluate residuals) for which the RSS (Residual Sum of Squares) is computed. The default partition is 75\ proportion ptest for the test set (large ptest is not recommended).

Value

bbreg returns an object of class "bbreg". The function bbreg.fit returns an object of class "bbreg_fit". The objects of classes bbreg and bbreg_fit return lists containing:

References

arXiv:2003.05157 (Barreto-Souza, Mayrink and Simas; 2020)

DOI:10.1080/00949655.2017.1350679 (Barreto-Souza and Simas; 2017)

DOI:10.18637/jss.v034.i02 (Cribari-Neto and Zeileis; 2010)

DOI:10.1198/016214505000000132 (Lijoi et al.; 2005)

See Also

summary.bbreg, plot.bbreg, simdata_bes, dbessel, dbbtest, simdata_bet, Formula

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Example with artificial data.
n <- 100
x <- cbind(rbinom(n, 1, 0.5), runif(n, -1, 1))
v <- runif(n, -1, 1)
z <- simdata_bes(
  kap = c(1, -1, 0.5), lam = c(0.5, -0.5), x, v,
  repetition = 1, link.mean = "logit", link.precision = "log"
)
z <- unlist(z)
fit1 <- bbreg(z ~ x | v)
summary(fit1)
plot(fit1)

# Examples using the Weather Task (WT) data available in bbreg.

fit2 <- bbreg(agreement ~ priming + eliciting, data = WT)
summary(fit2)


fit3 <- bbreg(agreement ~ priming + eliciting, envelope = 30, predict = 10, data = WT)
summary(fit3)

# Example with precision covariates

fit4 <- bbreg(agreement ~ priming + eliciting | eliciting, data = WT)
summary(fit4)

# Example with different link functions:

fit5 <- bbreg(agreement ~ priming + eliciting | eliciting,
  data = WT,
  link.mean = "cloglog", link.precision = "sqrt"
)
summary(fit5)

vpnsctl/bbreg documentation built on March 14, 2021, 12:11 a.m.