Prepare Compositional Data
Description
This function prepares a matrix with compositional variables for further processing in the DirichletReg package.
Usage
1 2 3 4 5 6 7 8 
Arguments
Y 
A 
trafo 
Either a logical or numeric value.
Transformation of variables causes the values to shrink away from extreme values of 0 and 1, see “Details”.

base 
The “base” component to use in the reparametrized model 
norm_tol 
Due to numerical precision, row sums of \mathbf{Y} may not be exactly equal to 1.
Therefore, 
x 
A 
type 
Displays either the (possibly normalized or transformed) 
object 
A 
... 
Further arguments 
Details
Y
Y
is a matrix
or data.frame
containing compositional variables.
If they do not sum up to 1 for all observations, normalization is forced where each row entry is divided by the row's sum (a warning will be issued that normalization was applied).
In case one rowentry (or more) is NA
, the whole row will be returned as NA
.
Betadistributed variables can be supplied as a single vector which, however, has to have values in the interval [0, 1].
The second variable will be generated (1  Y
) and a matrix
consisting of the columns 1  Y
and Y
will be returned.
A message will be issued that a betadistributed variable was assumed and that this assumtion needs to be checked.
trafo
The transformation (done if trafo = TRUE
) is a generalization of that proposed by Smithson and Verkuilen (2006) that transforms each component y of Y by computing y*=[y(n1)+1/2]/n where n is the number of observations in Y (this approach is also used in the package betareg, see CribariNeto & Zeileis, 2010).
For an arbitrary number of dimensions (or variables) d the transformation is y*=[y(n1)+1/d]/n.
base
To set the base (i.e., omitted) component of Y
for the “alternative” (mean/precision) model, the argument base
can be used. This is by default set to the first variable in Y
(if a vector is be supplied, the column 1  Y
becomes the base component).
Note that the definition can be overruled in DirichReg
.
x
and object
Objects created by DR_data
.
type
specifies for the print method whether the original or processed data are displayed.
Value
The function returns a matrix
object of class DirichletRegData
with the following attributes:
attr(*, "dimnames") 
a list with two entries, row names (by default 
attr(*, "Y.original") 
the original data 
attr(*, "dims") 
number of dimensions of 
attr(*, "dim.names") 
the number of components in 
attr(*, "obs") 
number of observations of 
attr(*, "valid_obs") 
number of valid observations 
attr(*, "normalized") 
a logical value indicating whether the data were normalized 
attr(*, "transformed") 
a logical value indicating whether the data were transformed 
attr(*, "base") 
number of the variable used as the base in the reparametrized model 
Author(s)
Marco J. Maier
References
Smithson, M. & Verkuilen, J. (2006). A Better Lemon Squeezer? MaximumLikelihood Regression With BetaDistributed Dependent Variables. Psychological Methods, 11(1), 54–71.
CribariNeto, F. & Zeileis, A. (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24.
Examples
1 2 3 4 5  # create a DirichletRegData object from the Arctic Lake data
head(ArcticLake[, 1:3])
AL < DR_data(ArcticLake[, 1:3])
summary(AL)
head(AL)
