# dirichreg: Fitting a Dirichlet Regression In DirichletReg: Dirichlet Regression in R

## Description

This function allows for fitting Dirichlet regression models using two different parametrizations.

## Usage

 ```1 2``` ```DirichReg(formula, data, model = c("common", "alternative"), subset, sub.comp, base, weights, control, verbosity = 0) ```

## Arguments

 `formula` the model formula (for different specifications see “Details”) `data` a `data.frame` containing independent and dependent variables `model` specifies whether the `"common"` (alphas) or `"alternative"` (mean/precision) parametrization is employed (see “Details”) `subset` estimates the model for a subset of the data `sub.comp` analyze a subcomposition by selecting specific components (see “Details”) `base` redefine the base variable `weights` frequency weights `control` a list containing control parameters used for the optimization `verbosity` prints information about the function's progress, see Details

## Details

#### Formula Specification and Models

`formula` determines the used predictors. The responses must be prepared by `DR_data` and can be optionally stored in the object containing all covariates which is then specified as the argument `data`. (Although “on-the-fly” processing of `DR_data` in a formula works, it is only intended for testing purposes and may be removed at any time – use at your own risk.)

There are two different parametrization (controlled by the argument `model`, see below):

• the “common” param. that models each alpha by an (possibly individual) set of predictors, and

• the “alternative” param. that models expected values (mu; as in multinomial logistic regression) and precision parameters (phi) with two sets of predictors.

As the two models offer different modeling strategies, the specification of their formulae differ:

##### Formulae for the “Common” Model

The simplest possible model here is to include only an intercept for all components. If `DV` is the ‘dependent variable’ (i.e., compositional data) with three components, we can request this null-model by `DV ~ 1`. We always have at least two dependent variables, so simple formulae as the one given above will be expanded to `DV ~ 1 | 1 | 1`, because `DV` hast three components. Likewise, it is possible to specify a common set of predictors for all components, as in `DV ~ p1 * p2`, where `p1` and `p2` are predictors.

If the covariates of the components shall differ, one has to set up a complete formula for each subcomposition, using `|` as separators between the components, for example, `DV ~ p1 | p1 + p2 | p1 * p2` will lead to a model where the first response in `DV` will be modeled using `p1`, the second will be predicted by `p1 + p2` and the third by `p1 * p2`. Note that if you use the latter approach, the predictors have to be stated explicitly for all response variables.

##### Formulae for the “Alternative” Model

The simplest possible model here is to include an intercept for all components (except the base) and an intercept for precision. This can be achieved by `DV ~ 1`, which is expanded to `DV ~ 1 | 1`. The part modeling the ‘mean’ (first element on the right-hand side) is mandatory, if no specification for precision is included, an intercept will be added. Note that you need to set `model = "alternative"` to use this parametrization!

The alternative parametrization consists of two parts: modeled expected values (mu) and their ‘precision’ (phi). As in multinomial logistic regression, one response variable is omitted (by default the first, but this can be changed by the `base` argument in `DR_data` or `DirichReg`) and for the rest a set of predictors is used with a multinomial logit-link. For precisions, a different set of predictors can be set up using a log-link.

`DV ~ p1 * p2 | p1 + p2` will set up a model where the expected values are predicted by `p1 * p2` and precision are modeled using `p1 + p2`.

#### Data Preparation

The `data` argument accepts a `data.frame` that must include the dependent variable as a named element (see examples how to do this).

#### Changing the Base Component and Analyzing Subcompositions

The base-component (i.e., omitted component) is initially set during the stage of data preparation `DR_data`, but can easily be changed using the argument `base` which takes integer values from 1 to the maximum number of components.

If a data set contains a large number of components, of which only a few are relevant, the latter can be ‘sorted out’ and the irrelevant (i.e., not selected) components will be aggregated into a single variable (row sums) that automatically becomes the base category for the model, unless specified otherwise by `base`. The positioning of variables will necessarily change: the aggregated variable takes the first column and the others are appended in their order of selection.

#### Subsets and Weights

Using `subset`, the model can be fitted only to a part of the data, for more information about this functionality, see `subset`.
Note that, unlike in `glm`, `weights` are not treated as prior weights, but as frequency weights!

#### Optimization and Verbosity

Using the `control` argument, the settings passed to the optimizers can be altered. This argument takes a named list. To supply user-defined starting values, use `control = list(sv=c(...))` and supply a vector containing initial values for all parameters. Optimizer-specific options include the number of iterations (`iterlim = 1000`) and convergence criteria for the BFGS- and NR-optimization ((`tol1 = 1e-5`) and (`tol2 = 1e-10`)).

Verbosity takes integer values from `0` to `4`. `0`, no information is printed (default). `1` prints information about 3 stages (preparation, starting values, estimation). `2` prints little information about optimization (`verbosity` values greater than one are passed to `print.default = verbosity - 1` of `maxBFGS` and `maxNR`). `3` prints more information about optimization. `4` prints all information about optimization.

## Value

 `call` [`language`] function call `parametrization` [`character`] used parametrization `varnames` [`character`] components' names `n.vars` [`numeric`] vector with the number of parameters per set of predictors `dims` [`numeric`] number of components `Y` [`numeric`] used components `X` [`numeric list`] sets of predictors `Z` [`numeric list`] sets of predictors (only for the alternative parametrization)
 `sub.comp` [`numeric`] vector of single components `base` [`numeric`] base (only for the alternative parametrization) `weights` [`numeric`] vector of frequency weights `orig.resp` [`DirichletRegData`] the original response `data` [`data.frame`] original data `d` [`data.frame`] used data `formula` [`Formula`] expanded formula `mf_formula` [`language`] expression for generating the model frame `npar` [`numeric`] number of parameters `coefficients` [`numeric`] named vector of parameters `coefnames` [`character`] names of the parameters `fitted.values` [`list of matrices`] list containing alpha's, mu's, phi's for the observations `logLik` [`numeric`] the log-likelihood `vcov` [`matrix`] covariance-matrix of parameter estimates `hessian` [`matrix`] (observed) Hessian `se` [`numeric`] vector of standard errors `optimization` [`list`] contains details about the optimization process provided by `maxBFGS` and `maxNR`

Marco J. Maier

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15``` ```ALake <- ArcticLake ALake\$Y <- DR_data(ALake[,1:3]) # fit a quadratic Dirichlet regression models ("common") res1 <- DirichReg(Y ~ depth + I(depth^2), ALake) # fit a Dirichlet regression with quadratic predictor for the mean and # a linear predictor for precision ("alternative") res2 <- DirichReg(Y ~ depth + I(depth^2) | depth, ALake, model="alternative") # test both models anova(res1, res2) res1 summary(res2) ```

### Example output

```Loading required package: Formula
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl_init' failed, running with rgl.useNULL = TRUE
call: fun(...)
Warning in DR_data(ALake[, 1:3]) :
not all rows sum up to 1 => normalization forced
Analysis of Deviance Table

Model 1: DirichReg(formula = Y ~ depth + I(depth^2), data = ALake)
Model 2: DirichReg(formula = Y ~ depth + I(depth^2) | depth, data = ALake,
model = "alternative")

Deviance N. par Difference df Pr(>Chi)
Model 1  -217.99      9
Model 2  -215.68      8     2.3136  1   0.1282

Call:
DirichReg(formula = Y ~ depth + I(depth^2), data = ALake)
using the common parametrization

Log-likelihood: 109 on 9 df (162 BFGS + 2 NR Iterations)

-----------------------------------------
Coefficients for variable no. 1: sand
(Intercept)        depth   I(depth^2)
1.4361967   -0.0072382    0.0001324
-----------------------------------------
Coefficients for variable no. 2: silt
(Intercept)        depth   I(depth^2)
-0.0259705    0.0717450   -0.0002679
-----------------------------------------
Coefficients for variable no. 3: clay
(Intercept)        depth   I(depth^2)
-1.7931487    0.1107906   -0.0004872
-----------------------------------------
Call:
DirichReg(formula = Y ~ depth + I(depth^2) | depth, data = ALake, model =
"alternative")

Standardized Residuals:
Min       1Q   Median      3Q     Max
sand  -1.7598  -0.7459  -0.1833  1.0148  2.7250
silt  -1.1459  -0.5379  -0.1581  0.2467  1.5572
clay  -1.9269  -0.6106  -0.0617  0.6294  1.9976

MEAN MODELS:
------------------------------------------------------------------
Coefficients for variable no. 1: sand
- variable omitted (reference category) -
------------------------------------------------------------------
Coefficients for variable no. 2: silt
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2885187  0.3835030  -3.360  0.00078 ***
depth        0.0706992  0.0147385   4.797 1.61e-06 ***
I(depth^2)  -0.0003247  0.0001210  -2.684  0.00727 **
------------------------------------------------------------------
Coefficients for variable no. 3: clay
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.9754613  0.4973405  -5.983 2.19e-09 ***
depth        0.1064013  0.0180002   5.911 3.40e-09 ***
I(depth^2)  -0.0005161  0.0001429  -3.612 0.000303 ***
------------------------------------------------------------------

PRECISION MODEL:
------------------------------------------------------------------
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.324382   0.357461   3.705 0.000211 ***
depth       0.041773   0.006604   6.325 2.53e-10 ***
------------------------------------------------------------------
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-likelihood: 107.8 on 8 df (95 BFGS + 2 NR Iterations)
AIC: -199.7, BIC: -186.4
Number of Observations: 39
Links: Logit (Means) and Log (Precision)
Parametrization: alternative
```

DirichletReg documentation built on May 29, 2017, 7:09 p.m.