Prepare Compositional Data

Share:

Description

This function prepares a matrix with compositional variables for further processing in the DirichletReg package.

Usage

1
2
3
4
5
6
7
8
DR_data(Y, trafo = sqrt(.Machine$double.eps), base = 1,
    norm_tol = sqrt(.Machine$double.eps))

## S3 method for class 'DirichletRegData'
print(x, type = c("processed", "original"), ...)

## S3 method for class 'DirichletRegData'
summary(object, ...)

Arguments

Y

A matrix or data.frame with nonnegative values of all compositional variables (in some cases, a vector is also permissible, see “Details”).

trafo

Either a logical or numeric value. Transformation of variables causes the values to shrink away from extreme values of 0 and 1, see “Details”.
If logical, it will force (TRUE) or suppress (FALSE) transformation. Suppressing transformation in the presence of extreme values (0 and 1) will result in an error.
If trafo is numeric it is used as a “threshold”, so transformation will be applied if values in Y are y < trafo or y > (1 - trafo).

base

The “base” component to use in the reparametrized model

norm_tol

Due to numerical precision, row sums of \mathbf{Y} may not be exactly equal to 1. Therefore, norm_tol is a small non-negative value (default: sqrt(.Machine$double.eps)) which represents the tolerance when testing for “near equality” to 1 (see all.equal).

x

A DirichletRegData object

type

Displays either the (possibly normalized or transformed) "processed" or "original" data

object

A DirichletRegData object

...

Further arguments

Details

Y

Y is a matrix or data.frame containing compositional variables. If they do not sum up to 1 for all observations, normalization is forced where each row entry is divided by the row's sum (a warning will be issued that normalization was applied).
In case one row-entry (or more) is NA, the whole row will be returned as NA. Beta-distributed variables can be supplied as a single vector which, however, has to have values in the interval [0, 1]. The second variable will be generated (1 - Y) and a matrix consisting of the columns 1 - Y and Y will be returned. A message will be issued that a beta-distributed variable was assumed and that this assumtion needs to be checked.

trafo

The transformation (done if trafo = TRUE) is a generalization of that proposed by Smithson and Verkuilen (2006) that transforms each component y of Y by computing y*=[y(n-1)+1/2]/n where n is the number of observations in Y (this approach is also used in the package betareg, see Cribari-Neto & Zeileis, 2010).
For an arbitrary number of dimensions (or variables) d the transformation is y*=[y(n-1)+1/d]/n.

base

To set the base (i.e., omitted) component of Y for the “alternative” (mean/precision) model, the argument base can be used. This is by default set to the first variable in Y (if a vector is be supplied, the column 1 - Y becomes the base component).
Note that the definition can be overruled in DirichReg.

x and object

Objects created by DR_data.

type

specifies for the print method whether the original or processed data are displayed.

Value

The function returns a matrix object of class DirichletRegData with the following attributes:

attr(*, "dimnames")

a list with two entries, row names (by default NULL) and column names.

attr(*, "Y.original")

the original data

attr(*, "dims")

number of dimensions of Y (i.e., number of columns)

attr(*, "dim.names")

the number of components in Y

attr(*, "obs")

number of observations of Y (i.e., number of rows)

attr(*, "valid_obs")

number of valid observations

attr(*, "normalized")

a logical value indicating whether the data were normalized

attr(*, "transformed")

a logical value indicating whether the data were transformed

attr(*, "base")

number of the variable used as the base in the reparametrized model

Author(s)

Marco J. Maier

References

Smithson, M. & Verkuilen, J. (2006). A Better Lemon Squeezer? Maximum-Likelihood Regression With Beta-Distributed Dependent Variables. Psychological Methods, 11(1), 54–71.

Cribari-Neto, F. & Zeileis, A. (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24.

Examples

1
2
3
4
5
# create a DirichletRegData object from the Arctic Lake data
head(ArcticLake[, 1:3])
AL <- DR_data(ArcticLake[, 1:3])
summary(AL)
head(AL)