initRandrot: Initialisation of a random rotation Object
In randRotation: Random Rotation Methods for High Dimensional Data with Batch Structure

Description Usage Arguments Details Value Author(s) References See Also Examples

Initialization of a linear model for subsequent generation of randomly rotated data (randrot) associated with the null hypothesis H0: β_coef.h = 0. Basics of rotation tests are found in \insertCiteHettegger2021randRotation and \insertCiteLangsrud2005randRotation.

initRandrot(Y = NULL, X = NULL, coef.h = NULL, weights = NULL, cormat = NULL)

initBatchRandrot(
  Y = NULL,
  X = NULL,
  coef.h = NULL,
  batch = NULL,
  weights = NULL,
  cormat = NULL
)

## S4 method for signature 'list'
initBatchRandrot(
  Y = NULL,
  X = Y$design,
  coef.h = NULL,
  batch = NULL,
  weights = Y$weights,
  cormat = NULL
)

## S4 method for signature 'list'
initRandrot(
  Y = NULL,
  X = Y$design,
  coef.h = NULL,
  weights = Y$weights,
  cormat = NULL
)

`Y`	a data matrix with `features x samples` dimensions or a list with elements `E`, `design` and `weights` (see `Details`). Missing values (`NA`) are allowed but e.g. lead to NAs for all samples of the respective features in the rotated dataset and should thus be avoided. We highly recommend avoiding missing values by e.g. replacing them by imputation or removing features containing NAs.
`X`	the design matrix of the experiment with `samples x coefficients` dimensions. For `initBatchRandrot`, specify the design matrix without the batch variable. A warning is generated if `X[,coef.d]` does not have full rank, see Details.
`coef.h`	single integer or vector of integers specifying the "hypothesis coefficients" (`H0` coefficients). `coef.h` should correspond to the last columns in `X` (see `Details`). If available, `attr(X, "coef.h")` is used, see `contrastModel`. By default, all coefficients are set as `H0` coefficients. If `coef.h` is set `-1`, no coefficient is set as `H0` coefficient.
`weights`	numerical matrix of finite positive weights > 0 (as in weighted least squares regression. Dimensions must be equal to dimensions of `Y`.
`cormat`	the sample correlation matrix with `samples x samples` dimensions. Must be a real symmetric positive-definite square matrix. See `Details` for usage in `initBatchRandrot`.
`batch`	Batch covariate of the same length as `ncol(Y)`.

This function performs basic initial checks and preparatory calculations for random rotation data generation. Nomenclature of variables is mainly as in \insertCiteLangsrud2005randRotation and \insertCiteHettegger2021randRotation. See also package vignette for application examples.

Y can also be a list with elements E, design and weights. Y$E is thereby used as Y, Y$design is used as X and Y$weights is used as weights. By this, the functions are compatible with results from e.g. voom (limma package), see Examples.

coef.h specifies the model coefficients associated with the null hypothesis ("hypothesis coefficients"). All other model coefficients are considered as "determined coefficients" coef.d \insertCiteLangsrud2005randRotation. The design matrix is rearranged so that coef.h correspond to the last columns of the design matrix and coef.d correspond to the first columns of the design matrix. This is necessary for adequate transformation of the combined null-hypothesis H0: β_coef.h = 0 by QR decomposition. If X[,coef.d] does not have full rank, a warning is generated and coef.d is set to coef.d <- seq_len(qr(X[,coef.d])$rank).

Weights must be finite positive numerics greater zero. This is necessary for model (QR) decomposition and for back transformation of the rotated data into the original variance structure, see also randrot. Weights as estimated e.g. by voom \insertCiteLaw2014randRotation are suitable and can be used without further processing. Note that due to the whitening transformation (i.e. by using the arguments weights and/or cormat) the rank of the transformed (whitened) design matrix X could change (become smaller), which could become dangerous for the fitting procedures. If you get errors using weights and/or cormat, try the routine without using weights and/or cormat to exclude this source of errors.

The following section provides a brief summary how rotations are calculated. A more general introduction is given in \insertCiteLangsrud2005randRotation. For reasons of readability, we omit writing %*% for matrix multiplication and write * for transposed matrix. The rotation is done by multiplying the features x samples data matrix Y with the transpose of the restricted random rotation matrix Rt

Rt = Xd Xd* + [Xh Xe] R [Xh Xe]*

with R being a (reduced) random rotation matrix and Xd, Xh and Xe being columns of the full QR decomposition of the design matrix X. [Xd Xh Xe] = qr.Q(qr(X), complete = TRUE), where Xd correspond to columns coef.d, Xh to columns coef.h and Xe to the remaining columns.

If weights and/or cormat are specified, each feature Y[i,] and the design matrix X are whitening transformed before rotation. The whitening matrix T is defined as T = solve(C) w, where solve(C) is the inverse Cholesky decompostion of the correlation matrix (cormat = CC*) and w is a diagonal matrix of the square roots of the sample weights for the according feature (w = diag(sqrt(weights[i,]))).

The rotated data for one feature y.r[i,] is thus calculated as

y.r[i,] = ( solve(T) Rt T (y[i,])* )* and [Xd Xh Xe] = qr.Q(qr(TX), complete = TRUE)

For weights = NULL and cormat = NULL, T is the identity matrix.

Note that a separate QR decomposition is calculated for each feature if weights are specified. The restricted random orthogonal matrix Rt is calculated with the same reduced random orthogonal matrix R for all features.

When using initBatchRandrot, initRandrot is called for each batch separately. When using initBatchRandrot with cormat, cormat needs to be a list of correlation matrices with one matrix for each batch. Note that this implicitly assumes a block design of the sample correlation matrix, where sample correlation coefficients between batches are zero. For a more general sample correlation matrix, allowing non-zero sample correlation coefficients between batches, see package vignette. Batches are split according to split(seq_along(batch), batch).

An initialised initRandrot, initRandrotW or initBatchRandrot object.

Peter Hettegger

\insertAllCited

randrot, rotateStat

# For further examples see '?rotateStat' and package vignette.

# Example 1: Compatibility with limma::voom

## Not run: 
v <- voom(counts, design)
ir <- initRandrot(v)
## End(Not run)

# Example 2:

#set.seed(0)

# Dataframe of phenotype data (sample information)
# We simulate 2 sample classes processed in 3 batches
pdata <- data.frame(batch = rep(1:3, c(10,10,10)),
                   phenotype = rep(c("Control", "Cancer"), c(5,5)))
features <- 100

# Matrix with random gene expression data
edata <- matrix(rnorm(features * nrow(pdata)), features)
rownames(edata) <- paste("feature", 1:nrow(edata))

mod1 <- model.matrix(~phenotype, pdata)

# Initialisation of the random rotation class
init1 <- initBatchRandrot(Y = edata, X = mod1, coef.h = 2, batch = pdata$batch)
init1
# See '?rotateStat'