bbreg: Bessel and Beta Regression Models
In vpnsctl/bbreg: Bessel and Beta Regressions via Expectation-Maximization Algorithm for Continuous Bounded Data

Description Usage Arguments Details Value References See Also Examples

Function to fit, via Expectation-Maximization (EM) algorithm, the bessel or the beta regression to a given data set with a bounded continuous response variable.

bbreg(
  formula,
  data,
  link.mean = c("logit", "probit", "cauchit", "cloglog"),
  link.precision = c("identity", "log", "sqrt"),
  model = NULL,
  residual = NULL,
  envelope = 0,
  prob = 0.95,
  predict = 0,
  ptest = 0.25,
  em_controls = list(maxit = 5000, em_tol = 10^(-5)),
  optim_method = "L-BFGS-B",
  optim_controls = list()
)

bbreg.fit(
  x,
  z,
  v = NULL,
  link.mean = c("logit", "probit", "cauchit", "cloglog"),
  link.precision = c("identity", "log", "sqrt"),
  model = NULL,
  residual = NULL,
  envelope = 0,
  prob = 0.95,
  predict = 0,
  ptest = 0.25,
  em_controls = list(maxit = 5000, em_tol = 10^(-5)),
  optim_method = "L-BFGS-B",
  optim_controls = list()
)

`formula`	symbolic description of the model (examples: `z ~ x1 + x2` and `z ~ x1 + x2 \| v1 + v2`); see details below.
`data`	elements expressed in formula. This is usually a data frame composed by: (i) the bounded continuous observations in `z` (0 < z_i < 1), (ii) covariates for the mean submodel (columns `x1` and `x2`) and (iii) covariates for the precision submodel (columns `v1` and `v2`).
`link.mean`	optionally, a string containing the link function for the mean. If omitted, the 'logit' link function will be used. The possible link functions for the mean are "logit","probit", "cauchit", "cloglog".
`link.precision`	optionally, a string containing the link function the precision parameter. If omitted and the only precision covariate is the intercept, the 'identity' link function will be used, if omitted and there is a precision covariate other than the intercept, the 'log' link function will be used. The possible link functions for the precision parameter are "identity", "log", "sqrt".
`model`	character ("bessel" or "beta") indicating the type of model to be fitted. The default is NULL, meaning that a discrimination test must be applied to select the model.
`residual`	character indicating the type of residual to be evaluated ("pearson", "score" or "quantile"). The default is "pearson".
`envelope`	number of simulations (synthetic data sets) to build envelopes for residuals (with `100*prob%` confidence level). The default `envelope = 0` dismisses the envelope analysis.
`prob`	probability indicating the confidence level for the envelopes (default: `prob` = 0.95). If `envelope` = 0, `prob` is ignored.
`predict`	number of partitions (training set to fit the model and a test set to calculate residuals) to be evaluated in a predictive accuracy study related to the `RSS_pred` (residual sum of squares for the partition "test set"). The default `predict` = 0 dismisses the `RSS_pred` analysis. The partitions are randomly defined when predict is set as a positive integer.
`ptest`	proportion of the sample size to be considered in the test set for the `RSS_pred` analysis (default = 0.25 = 25\ If predict = 0, ptest is ignored.
`em_controls`	a list containing two elements: `maxit` that contains the maximum number of iterations of the EM algorithm, the default is set to 5000; `em_tol` that defines the tolerance value to control the convergence criterion in the EM-algorithm, the default is set to 10^(-5).
`optim_method`	main optimization algorithm to be used. The available methods are the same as those of `optim` function. The default is set to "L-BFGS-B".
`optim_controls`	a list of control arguments to be passed to the `optim` function in the optimization of the model. For the control options, see the 'Details' in the help of `optim` for the possible arguments.
`x`	matrix of covariates with respect to the mean with dimension `(n,nkap)`.
`z`	vector of response variables with length `n`. Each coordinate must belong to the standard unit interval (0,1).
`v`	matrix of covariates with respect to the precision parameter. The default is `NULL`. If not `NULL` must be of dimension `(n,nlam)`.

The bessel regression originates from a class of normalized inverse-Gaussian (N-IG) process introduced in Lijoi et al. (2005) as an alternative to the widely used Dirichlet process in the Bayesian context. These authors consider a ratio of inverse-Gaussian random variables to define the new process. In the particular univariate case, the N-IG is obtained from the representation "Z = Y1/(Y1+Y2)", with "Y1" and "Y2" being independent inverse-Gaussian random variables having scale = 1 and shape parameters "a1 > 0" and "a2 > 0", respectively. Denote "Y1 ~ IG(a1)" and "Y2 ~ IG(a2)". The density of "Z" has support in the interval (0,1) and it depends on the modified Bessel function of third kind with order 1, named here as "K1(-)". The presence of "K1(-)" in the structure of the p.d.f. establishes the name of the new distribution; consider Z ~ Bessel(a1,a2). Note that the name "beta distribution" is also an analogy to the presence of a function (the beta function) in its p.d.f. structure. The bessel regression model is defined by assuming "Z_1,...,Z_n" as a random sample of continuous bounded responses with "Z_i ~ Bessel(mu_i,phi_i)" for "i = 1,...,n". Using this parameterization, one can write: "E(Z_i) = mu_i" and "Var(Z_i) = mu_i(1-mu_i) g(phi_i)", where "g(-)" is a function depending on the exponential integral of "phi_i". The following link functions are assumed "logit(mu_i) = x_i^T kappa" and "log(phi_i) = v_i^T lambda", where "kappa' = (kappa_1,...,kappa_p)" and "lambda' = (lambda_1,...,lambda_q)" are real valued vectors. The terms "x_i^T" and "v_i^T" represent, respectively, the i-th row of the matrices "x" (nxp) and "v" (nxq) containing covariates in their columns ("x_i,1" and "v_i,1" may be 1 to handle intercepts). As it can be seen, this regression model has two levels with covariates explaining the mean "mu_i" and the parameter "phi_i". For more details about the bessel regression see Barreto-Souza, Mayrink and Simas (2020).

This package implements an Expectation Maximization (EM) algorithm to fit the bessel regression. The full EM approach proposed in Barreto-Souza and Simas (2017) for the beta regression is also available here. Fitting the beta regression via EM-algorithm is a major difference between the present package bbreg and the well known betareg created by Alexandre B. Simas and currently maintained by Achim Zeileis. The estimation procedure on the betareg packages is given by maximizing the beta model likelihood via optim. In terms of initial values, bbreg uses quasi-likelihood estimates as the starting points for the EM-algorithms. The formulation of the target model also has the same structure as in the standard functions lm, glm and betareg, with also the same structure as the latter when precision covariates are being used. The user is supposed to write a formula object describing elements of the regression (response, covariates for the mean submodel, covariates for the precision submodel, presence of intercepts, and interactions). As an example, the description "z ~ x" indicates: "response = z" (continuous and bounded by 0 and 1), "covariates = columns of x" (mean submodel) and precision submodel having only an intercept. On the other hand, the configuration "z ~ x | v" establishes that the covariates given in the columns of "v" must be used in the precision submodel. Intercepts may be removed by setting "z ~ 0 + x | 0 + v" or "z ~ x - 1|v - 1". Absence of intercept and covariates is not allowed in any submodel. The type of model to be fitted ("bessel" or "beta") can be specified through the argument "model" of bbreg. If the user does not specify the model, the package will automatically apply a discrimination test (DBB - Discrimination between Bessel and Beta), developed in Barreto-Souza, Mayrink and Simas (2020), to select the most appropriate model for the given data set. In this case, some quantities related to the DBB are included in the final output; they are: "sum(Z2/n)" = mean of z_i^2, "sum(quasi_mu)" = sum (for i = 1,...,n) of muq_i + muq_i(1-muq_i)/2, with muq_i being the quasi-likelihood estimator of mu_i and, finally, the quantities "|D_bessel|" and "|D_beta|" depending on muq_i and the EM-estimates of phi_i under bessel or beta.

In the current version, three types of residuals are available for analysis ("Pearson", "Score" and "Quantile"). The user may choose one of them via the argument "residual". The score residual is computed empirically, based on 100 artificial data sets generated from the fitted model. The sample size is the same of the original data and the simulations are used to estimate the mean and s.d. required in the score residual formulation. The user may also choose to build envelopes for the residuals with confidence level in "prob". This feature also requires simulations of synthetic data ("envelope" is the number of replications). Residuals are obtained for each data set and confronted against the quantiles of the N(0,1). Predictive accuracy of the fitted model is also explored by setting "predict" as a positive integer (this value represents the number of random partitions to be evaluated). In this case, the full data set is separated in a training (partition to fit the model) and a test set (to evaluate residuals) for which the RSS (Residual Sum of Squares) is computed. The default partition is 75\ proportion ptest for the test set (large ptest is not recommended).

bbreg returns an object of class "bbreg". The function bbreg.fit returns an object of class "bbreg_fit". The objects of classes bbreg and bbreg_fit return lists containing:

coefficients - a list with elements "mean" and "precision" containing the estimated coefficients of the model;
call - the formula used by the model. If using bbreg.fit, this returns NULL.
modelname - the fitted model, bessel or beta;
message - message to be displayed, regarding the usage or not of the discrimination criterion;
residualname - the name of the chosen residual in the call;
niter - number of iterations of the EM algorithm;
start - the initial guesses of the parameters
intercept - vector indicating if the intercept is present in the mean and/or in the precision regressions;
link.mean - link function of the mean;
link.precision - link function of the precision parameter;
kappa - vector containing the estimates of the mean-related coefficients;
lambda - vector containing the estimates of the precision-related coefficients;
mu - estimated means;
fitted.values - the fitted values in the response scale;
efron.pseudo.r2 - Efron's pseudo R^2: the squared correlation between the response variables and the predicted values;
x - the covariates related to the mean;
v - the covariates related to the precision parameter;
z - the response variables;
gphi - the estimated dispersion parameters;
residuals - the values of the chosen residual in the call;
std_errors - the standard errors of the estimated parameters;
RSS - sum of squared residuals;
RSS_pred - sum of squared residuals from the predictions (when prediction is performed);
DBB - vector containing the discrimination statistics (when discrimination is performed);
envelope - the numerical envelopes used to build the Q-Q plot with simulated envelopes;
terms - (only for bbreg)the terms object used;
levels - (where relevant, only for bbreg) the levels of the factors used;
contrasts - (where relevant, only for bbreg) the contrasts used.

arXiv:2003.05157 (Barreto-Souza, Mayrink and Simas; 2020)

DOI:10.1080/00949655.2017.1350679 (Barreto-Souza and Simas; 2017)

DOI:10.18637/jss.v034.i02 (Cribari-Neto and Zeileis; 2010)

DOI:10.1198/016214505000000132 (Lijoi et al.; 2005)

summary.bbreg, plot.bbreg, simdata_bes, dbessel, dbbtest, simdata_bet, Formula

# Example with artificial data.
n <- 100
x <- cbind(rbinom(n, 1, 0.5), runif(n, -1, 1))
v <- runif(n, -1, 1)
z <- simdata_bes(
  kap = c(1, -1, 0.5), lam = c(0.5, -0.5), x, v,
  repetition = 1, link.mean = "logit", link.precision = "log"
)
z <- unlist(z)
fit1 <- bbreg(z ~ x | v)
summary(fit1)
plot(fit1)

# Examples using the Weather Task (WT) data available in bbreg.

fit2 <- bbreg(agreement ~ priming + eliciting, data = WT)
summary(fit2)


fit3 <- bbreg(agreement ~ priming + eliciting, envelope = 30, predict = 10, data = WT)
summary(fit3)

# Example with precision covariates

fit4 <- bbreg(agreement ~ priming + eliciting | eliciting, data = WT)
summary(fit4)

# Example with different link functions:

fit5 <- bbreg(agreement ~ priming + eliciting | eliciting,
  data = WT,
  link.mean = "cloglog", link.precision = "sqrt"
)
summary(fit5)

vpnsctl/bbreg documentation built on March 14, 2021, 12:11 a.m.

vpnsctl/bbreg index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

vpnsctl/bbreg
Bessel and Beta Regressions via Expectation-Maximization Algorithm for Continuous Bounded Data

bbreg: Bessel and Beta Regression Models
In vpnsctl/bbreg: Bessel and Beta Regressions via Expectation-Maximization Algorithm for Continuous Bounded Data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to bbreg in vpnsctl/bbreg...

R Package Documentation

Browse R Packages

We want your feedback!

vpnsctl/bbreg Bessel and Beta Regressions via Expectation-Maximization Algorithm for Continuous Bounded Data

bbreg: Bessel and Beta Regression Models In vpnsctl/bbreg: Bessel and Beta Regressions via Expectation-Maximization Algorithm for Continuous Bounded Data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to bbreg in vpnsctl/bbreg...

R Package Documentation

Browse R Packages

We want your feedback!

vpnsctl/bbreg
Bessel and Beta Regressions via Expectation-Maximization Algorithm for Continuous Bounded Data

bbreg: Bessel and Beta Regression Models
In vpnsctl/bbreg: Bessel and Beta Regressions via Expectation-Maximization Algorithm for Continuous Bounded Data