pFdr: Calculate resampling based p-values and FDRs
In randRotation: Random Rotation Methods for High Dimensional Data with Batch Structure

Description Usage Arguments Details Value Author(s) References See Also Examples

This function calculates either (1) resampling based p-values with subsequent p-value adjustment using stats::p.adjust or (2) resampling based false-discovery-rates (FDRs) for rotated statistics from a rotateStat object.

1	pFdr(obj, method = "none", pooled = TRUE, na.rm = FALSE, beta = 0.05)

`obj`	A `rotateStat` object as returned by `rotateStat`.
`method`	Can be either `"none"` (default), `"fdr.q"`, `"fdr.qu"` or any term that can be passed as `method` argument to `stats::p.adjust`, see `Details`. If `method = "none"`, resampling based p-values without further adjustment are calculated.
`pooled`	`logical`. `TRUE` (default) if marginal distributions are exchangeable for all features so that rotated stats can be pooled, see `Details`.
`na.rm`	`logical`. `NA` values are ignored if set `TRUE`. `NA` values should be avoided and could e.g. be removed by imputation in original data or by removing features that contain `NA` values. Few `NA` values do not have a large effect, but many `NA` values can lead to wrong estimations of p-values and FDRs. We highly recommend avoiding `NA` values.
`beta`	`numeric` between `0` and `1`. Corresponds to beta in \insertCiteYekutieli1999randRotation.

Larger values of obj$s0 are considered more significant when compared to the empirical distribution. E.g. for calculation of resampling based p-values (with pooled = FALSE) we in principle use p.val <- (rowSums(obj$stats >= obj$s0)+1)/(ncol(obj$stats)+1) according to \insertCitePhipson2010randRotation.

method = "fdr.q" and method = "fdr.qu" are resampling based fdr estimates and can only be used with pooled = TRUE. method = "fdr.q" is the FDR local estimator and method = "fdr.qu" is the FDR upper limit, see \insertCiteReiner2003,Yekutieli1999randRotation. For all other method arguments resampling based p-values are calculated and passed to stats::p.adjust for p-value adjustment. So these methods provide resampling based p-values with (non-resampling based) p-value adjustment. method = "fdr.q" and method = "fdr.qu" were adapted from package fdrame \insertCiteFdrame2019,Reiner2003randRotation.

When pooled = TRUE, marginal distributions of the test statistics are considered exchangeable for all features. The resampling based p-values of each feature are then calculated from all rotated statistics (all features, all rotations). For these cases, if the number of features is reasonably large, usually only few resamples (argument R in rotateStat) are required. We want to emphasize that in order for the marginal distributions to be exchangeable, the statistics must be a pivotal quantity (i.e. it must be scale independent). Pivotal quantities are e.g. t values. Using e.g. linear models with coef as statistics is questionable if the different features are measured on different scales. The resampled coefficients then have different variances and pooled = TRUE is not applicable. We thus highly recommend using pivotal quantities as statistics in rotateStat if possible.

When pooled = FALSE the resampling based p-values are calculcated for each feature separately. This is required if one expects the resampling based statistics to be distributed differently for individual features. For most common applications this should not be the case and the marginal distribution are exchangeable for all features, hence pooled = TRUE by default.

If method = "fdr.q" or method = "fdr.qu" and weights were specified when initialising the random rotation object (see parameter initialised.obj in rotateStat), a warning is displayed. The correlation structure (dependence structure) of linear model coefficients between different features is not preserved if different weights are used for different features. Methods fdr.q and fdr.qu rely on preserved correlation structure of dependent statistics and thus should not be used if statistics based on model coefficients (e.g. t statistics of model coefficients) are used in combination with different weights.

P-values and FDRs are calculated for each column of obj$s0 separately.

A numeric matrix of corrected p-values or FDRs with dimension dim(obj$s0).

Peter Hettegger

\insertAllCited

rotateStat

# See also '?rotateStat':

#set.seed(0)

# Dataframe of phenotype data (sample information)
# We simulate 2 sample classes processed in 3 batches
pdata <- data.frame(batch = rep(1:3, c(10,10,10)),
                   phenotype = rep(c("Control", "Cancer"), c(5,5)))
features <- 100

# Matrix with random gene expression data
edata <- matrix(rnorm(features * nrow(pdata)), features)
rownames(edata) <- paste("feature", 1:nrow(edata))

mod1 <- model.matrix(~phenotype, pdata)

# Initialisation of the random rotation class
init1 <- initBatchRandrot(Y = edata, X = mod1, coef.h = 2, batch = pdata$batch)
init1

# Definition of the batch effect correction procedure with subsequent calculation
# of two-sided test statistics
statistic <- function(., batch, mod, coef){

  # The "capture.output" and "suppressMessages" simply suppress any output
  capture.output(suppressMessages(
    Y.tmp <- sva::ComBat(., batch = batch, mod)
  ))

  fit1 <- lm.fit(mod, t(Y.tmp))
  abs(coef(fit1)[coef,])
}

# We calculate test statistics for the second coefficient

res1 <- rotateStat(initialised.obj = init1,
                    R = 10,
                    statistic = statistic,
                    batch = pdata$batch, mod = mod1, coef = 2)

hist(pFdr(res1))