Description Usage Arguments Details Value References See Also Examples
Function to fit, via Expectation-Maximization (EM) algorithm, the bessel or the beta regression to a given data set with a bounded continuous response variable.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | bbreg(
formula,
data,
link.mean = c("logit", "probit", "cauchit", "cloglog"),
link.precision = c("identity", "log", "sqrt"),
model = NULL,
residual = NULL,
envelope = 0,
prob = 0.95,
predict = 0,
ptest = 0.25,
em_controls = list(maxit = 5000, em_tol = 10^(-5)),
optim_method = "L-BFGS-B",
optim_controls = list()
)
bbreg.fit(
x,
z,
v = NULL,
link.mean = c("logit", "probit", "cauchit", "cloglog"),
link.precision = c("identity", "log", "sqrt"),
model = NULL,
residual = NULL,
envelope = 0,
prob = 0.95,
predict = 0,
ptest = 0.25,
em_controls = list(maxit = 5000, em_tol = 10^(-5)),
optim_method = "L-BFGS-B",
optim_controls = list()
)
|
formula |
symbolic description of the model (examples: |
data |
elements expressed in formula. This is usually a data frame composed by:
(i) the bounded continuous observations in |
link.mean |
optionally, a string containing the link function for the mean. If omitted, the 'logit' link function will be used. The possible link functions for the mean are "logit","probit", "cauchit", "cloglog". |
link.precision |
optionally, a string containing the link function the precision parameter. If omitted and the only precision covariate is the intercept, the 'identity' link function will be used, if omitted and there is a precision covariate other than the intercept, the 'log' link function will be used. The possible link functions for the precision parameter are "identity", "log", "sqrt". |
model |
character ("bessel" or "beta") indicating the type of model to be fitted. The default is NULL, meaning that a discrimination test must be applied to select the model. |
residual |
character indicating the type of residual to be evaluated ("pearson", "score" or "quantile"). The default is "pearson". |
envelope |
number of simulations (synthetic data sets) to build envelopes for residuals (with |
prob |
probability indicating the confidence level for the envelopes (default: |
predict |
number of partitions (training set to fit the model and a test set to calculate residuals) to be evaluated in a predictive accuracy
study related to the |
ptest |
proportion of the sample size to be considered in the test set for the |
em_controls |
a list containing two elements: |
optim_method |
main optimization algorithm to be used. The available methods are the same as those of |
optim_controls |
a list of control arguments to be passed to the |
x |
matrix of covariates with respect to the mean with dimension |
z |
vector of response variables with length |
v |
matrix of covariates with respect to the precision parameter. The default is |
The bessel regression originates from a class of normalized inverse-Gaussian (N-IG) process introduced in Lijoi et al. (2005) as an alternative to the widely used Dirichlet process in the Bayesian context. These authors consider a ratio of inverse-Gaussian random variables to define the new process. In the particular univariate case, the N-IG is obtained from the representation "Z = Y1/(Y1+Y2)", with "Y1" and "Y2" being independent inverse-Gaussian random variables having scale = 1 and shape parameters "a1 > 0" and "a2 > 0", respectively. Denote "Y1 ~ IG(a1)" and "Y2 ~ IG(a2)". The density of "Z" has support in the interval (0,1) and it depends on the modified Bessel function of third kind with order 1, named here as "K1(-)". The presence of "K1(-)" in the structure of the p.d.f. establishes the name of the new distribution; consider Z ~ Bessel(a1,a2). Note that the name "beta distribution" is also an analogy to the presence of a function (the beta function) in its p.d.f. structure. The bessel regression model is defined by assuming "Z_1,...,Z_n" as a random sample of continuous bounded responses with "Z_i ~ Bessel(mu_i,phi_i)" for "i = 1,...,n". Using this parameterization, one can write: "E(Z_i) = mu_i" and "Var(Z_i) = mu_i(1-mu_i) g(phi_i)", where "g(-)" is a function depending on the exponential integral of "phi_i". The following link functions are assumed "logit(mu_i) = x_i^T kappa" and "log(phi_i) = v_i^T lambda", where "kappa' = (kappa_1,...,kappa_p)" and "lambda' = (lambda_1,...,lambda_q)" are real valued vectors. The terms "x_i^T" and "v_i^T" represent, respectively, the i-th row of the matrices "x" (nxp) and "v" (nxq) containing covariates in their columns ("x_i,1" and "v_i,1" may be 1 to handle intercepts). As it can be seen, this regression model has two levels with covariates explaining the mean "mu_i" and the parameter "phi_i". For more details about the bessel regression see Barreto-Souza, Mayrink and Simas (2020).
This package implements an Expectation Maximization (EM) algorithm to fit the bessel regression. The full EM approach proposed in Barreto-Souza and Simas (2017) for the beta
regression is also available here. Fitting the beta regression via EM-algorithm is a major difference between the present package bbreg and the
well known betareg
created by Alexandre B. Simas and currently maintained by Achim Zeileis. The estimation procedure on the betareg
packages
is given by maximizing the beta model likelihood via optim
.
In terms of initial values, bbreg uses quasi-likelihood estimates as the starting points for
the EM-algorithms. The formulation of the target model also has the same structure as in the standard functions lm
, glm
and betareg
,
with also the same structure as the latter when precision covariates are being used. The user is supposed to
write a formula object describing elements of the regression (response, covariates for the mean submodel,
covariates for the precision submodel, presence of intercepts, and interactions). As an example, the description
"z ~ x" indicates: "response = z" (continuous and bounded by 0 and 1), "covariates = columns of x" (mean submodel) and
precision submodel having only an intercept. On the other hand, the configuration "z ~ x | v" establishes that the covariates given
in the columns of "v" must be used in the precision submodel. Intercepts may be removed by setting
"z ~ 0 + x | 0 + v" or "z ~ x - 1|v - 1". Absence of intercept and covariates is not allowed in any submodel.
The type of model to be fitted ("bessel" or "beta") can be specified through the argument "model" of
bbreg. If the user does not specify the model, the package will automatically apply a discrimination
test (DBB - Discrimination between Bessel and Beta),
developed in Barreto-Souza, Mayrink and Simas (2020), to select the most appropriate model for the given
data set. In this case, some quantities related to the DBB are included in the final output; they are:
"sum(Z2/n)" = mean of z_i^2, "sum(quasi_mu)" = sum (for i = 1,...,n) of muq_i + muq_i(1-muq_i)/2,
with muq_i being the quasi-likelihood estimator of mu_i and, finally, the quantities "|D_bessel|" and
"|D_beta|" depending on muq_i and the EM-estimates of phi_i under bessel or beta.
In the current version, three types of residuals are available for analysis ("Pearson", "Score" and "Quantile").
The user may choose one of them via the argument "residual". The score residual is computed empirically, based
on 100 artificial data sets generated from the fitted model. The sample size
is the same of the original data and the simulations are used to estimate the mean and s.d. required in the score
residual formulation. The user
may also choose to build envelopes for the residuals with confidence level in "prob". This feature also requires simulations of synthetic data
("envelope" is the number of replications). Residuals are obtained for each data set and confronted against the quantiles of the N(0,1). Predictive
accuracy of the fitted model is also explored by setting "predict" as a positive integer (this value represents the number of random partitions to be evaluated).
In this case, the full data set is separated in a training (partition to fit the model) and a test set (to evaluate residuals) for which the
RSS (Residual Sum of Squares) is computed. The default partition is 75\
proportion ptest
for the test set (large ptest
is not recommended).
bbreg
returns an object of class "bbreg". The function bbreg.fit
returns an object of class "bbreg_fit".
The objects of classes bbreg
and bbreg_fit
return lists containing:
coefficients
- a list with elements "mean" and "precision" containing the estimated coefficients of the model;
call
- the formula used by the model. If using bbreg.fit
, this returns NULL
.
modelname
- the fitted model, bessel or beta;
message
- message to be displayed, regarding the usage or not of the discrimination criterion;
residualname
- the name of the chosen residual in the call;
niter
- number of iterations of the EM algorithm;
start
- the initial guesses of the parameters
intercept
- vector indicating if the intercept is present in the mean and/or in the precision regressions;
link.mean
- link function of the mean;
link.precision
- link function of the precision parameter;
kappa
- vector containing the estimates of the mean-related coefficients;
lambda
- vector containing the estimates of the precision-related coefficients;
mu
- estimated means;
fitted.values
- the fitted values in the response scale;
efron.pseudo.r2
- Efron's pseudo R^2: the squared correlation between the response variables and the predicted values;
x
- the covariates related to the mean;
v
- the covariates related to the precision parameter;
z
- the response variables;
gphi
- the estimated dispersion parameters;
residuals
- the values of the chosen residual in the call;
std_errors
- the standard errors of the estimated parameters;
RSS
- sum of squared residuals;
RSS_pred
- sum of squared residuals from the predictions (when prediction is performed);
DBB
- vector containing the discrimination statistics (when discrimination is performed);
envelope
- the numerical envelopes used to build the Q-Q plot with simulated envelopes;
terms
- (only for bbreg
)the terms
object used;
levels
- (where relevant, only for bbreg
) the levels of the factors used;
contrasts
- (where relevant, only for bbreg
) the contrasts used.
arXiv:2003.05157 (Barreto-Souza, Mayrink and Simas; 2020)
DOI:10.1080/00949655.2017.1350679 (Barreto-Souza and Simas; 2017)
DOI:10.18637/jss.v034.i02 (Cribari-Neto and Zeileis; 2010)
DOI:10.1198/016214505000000132 (Lijoi et al.; 2005)
summary.bbreg
, plot.bbreg
, simdata_bes
, dbessel
, dbbtest
, simdata_bet
, Formula
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | # Example with artificial data.
n <- 100
x <- cbind(rbinom(n, 1, 0.5), runif(n, -1, 1))
v <- runif(n, -1, 1)
z <- simdata_bes(
kap = c(1, -1, 0.5), lam = c(0.5, -0.5), x, v,
repetition = 1, link.mean = "logit", link.precision = "log"
)
z <- unlist(z)
fit1 <- bbreg(z ~ x | v)
summary(fit1)
plot(fit1)
# Examples using the Weather Task (WT) data available in bbreg.
fit2 <- bbreg(agreement ~ priming + eliciting, data = WT)
summary(fit2)
fit3 <- bbreg(agreement ~ priming + eliciting, envelope = 30, predict = 10, data = WT)
summary(fit3)
# Example with precision covariates
fit4 <- bbreg(agreement ~ priming + eliciting | eliciting, data = WT)
summary(fit4)
# Example with different link functions:
fit5 <- bbreg(agreement ~ priming + eliciting | eliciting,
data = WT,
link.mean = "cloglog", link.precision = "sqrt"
)
summary(fit5)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.