# DR_data: Prepare Compositional Data In DirichletReg: Dirichlet Regression in R

## Description

This function prepares a matrix with compositional variables for further processing in the DirichletReg package.

## Usage

 1 2 3 4 5 6 7 8 DR_data(Y, trafo = sqrt(.Machine$double.eps), base = 1, norm_tol = sqrt(.Machine$double.eps)) ## S3 method for class 'DirichletRegData' print(x, type = c("processed", "original"), ...) ## S3 method for class 'DirichletRegData' summary(object, ...) 

## Arguments

 Y A matrix or data.frame with nonnegative values of all compositional variables (in some cases, a vector is also permissible, see “Details”).
 trafo Either a logical or numeric value. Transformation of variables causes the values to shrink away from extreme values of 0 and 1, see “Details”. If logical, it will force (TRUE) or suppress (FALSE) transformation. Suppressing transformation in the presence of extreme values (0 and 1) will result in an error. If trafo is numeric it is used as a “threshold”, so transformation will be applied if values in Y are y < trafo or y > (1 - trafo).
 base The “base” component to use in the reparametrized model
 norm_tol Due to numerical precision, row sums of \mathbf{Y} may not be exactly equal to 1. Therefore, norm_tol is a small non-negative value (default: sqrt(.Machine\$double.eps)) which represents the tolerance when testing for “near equality” to 1 (see all.equal).
 x A DirichletRegData object
 type Displays either the (possibly normalized or transformed) "processed" or "original" data
 object A DirichletRegData object
 ... Further arguments

## Details

#### Y

Y is a matrix or data.frame containing compositional variables. If they do not sum up to 1 for all observations, normalization is forced where each row entry is divided by the row's sum (a warning will be issued that normalization was applied).
In case one row-entry (or more) is NA, the whole row will be returned as NA. Beta-distributed variables can be supplied as a single vector which, however, has to have values in the interval [0, 1]. The second variable will be generated (1 - Y) and a matrix consisting of the columns 1 - Y and Y will be returned. A message will be issued that a beta-distributed variable was assumed and that this assumtion needs to be checked.

#### trafo

The transformation (done if trafo = TRUE) is a generalization of that proposed by Smithson and Verkuilen (2006) that transforms each component y of Y by computing y*=[y(n-1)+1/2]/n where n is the number of observations in Y (this approach is also used in the package betareg, see Cribari-Neto & Zeileis, 2010).
For an arbitrary number of dimensions (or variables) d the transformation is y*=[y(n-1)+1/d]/n.

#### base

To set the base (i.e., omitted) component of Y for the “alternative” (mean/precision) model, the argument base can be used. This is by default set to the first variable in Y (if a vector is be supplied, the column 1 - Y becomes the base component).
Note that the definition can be overruled in DirichReg.

#### x and object

Objects created by DR_data.

#### type

specifies for the print method whether the original or processed data are displayed.

## Value

The function returns a matrix object of class DirichletRegData with the following attributes:

 attr(*, "dimnames") a list with two entries, row names (by default NULL) and column names. attr(*, "Y.original") the original data attr(*, "dims") number of dimensions of Y (i.e., number of columns) attr(*, "dim.names") the number of components in Y attr(*, "obs") number of observations of Y (i.e., number of rows) attr(*, "valid_obs") number of valid observations attr(*, "normalized") a logical value indicating whether the data were normalized attr(*, "transformed") a logical value indicating whether the data were transformed attr(*, "base") number of the variable used as the base in the reparametrized model

Marco J. Maier

## References

Smithson, M. & Verkuilen, J. (2006). A Better Lemon Squeezer? Maximum-Likelihood Regression With Beta-Distributed Dependent Variables. Psychological Methods, 11(1), 54–71.

Cribari-Neto, F. & Zeileis, A. (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24.

## Examples

 1 2 3 4 5 # create a DirichletRegData object from the Arctic Lake data head(ArcticLake[, 1:3]) AL <- DR_data(ArcticLake[, 1:3]) summary(AL) head(AL) 

### Example output

Loading required package: Formula
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl_init' failed, running with rgl.useNULL = TRUE
call: fun(...)
sand  silt  clay
1 0.775 0.195 0.030
2 0.719 0.249 0.032
3 0.507 0.361 0.132
4 0.522 0.409 0.066
5 0.700 0.265 0.035
6 0.665 0.322 0.013
Warning in DR_data(ArcticLake[, 1:3]) :
not all rows sum up to 1 => normalization forced
This object contains compositional data with 3 dimensions.
Number of observations: 39 of which 39 ( 100% ) are valid.

Note: The data were normalized.
[1] 0.7750000 0.7190000 0.5070000 0.5235707 0.7000000 0.6650000


DirichletReg documentation built on May 29, 2017, 7:09 p.m.