# TOC: Theoretical FDR and sensitivity as a function of cutoff level In OCplus: Operating characteristics plus sample size and local fdr for microarray experiments

## Description

Computes and plots the operating characteristics for a two group microarray experiment based on a theoretical model. The false discovery rate (FDR) is plotted against the cutoff level on the t-statistic. Optionally, curves for the the classical significance level and sensitivity can be added. Different curves for different proportions of non-differentially expressed genes can be compared in the same plot, and the sample size per group can be varied between plots.

## Usage

 ```1 2 3``` ```TOC(n = 10, p0 = 0.95, sigma = 1, D, F0, F1, n1 = n, n2 = n, paired = FALSE, plot = TRUE, local.show=FALSE, alpha.show = TRUE, sensitivity.show = TRUE, nplot = 100, xlim, ylim = c(0, 1), main, legend.show = FALSE, ...) ```

## Arguments

 `n, n1, n2` number of samples per group, by default equal and specified via `n`, but can be set to different values via `n1` and `n2`. `p0` the proportion of not differentially expressed genes, may be vector valued `sigma` the standard deviation for the log expression values `D` assumed average log fold change (in units of `sigma`), by default 1; this is a shortcut for specifying a simple symmetrical alternative hypothesis through `F1`. `F0` the distribution of the log2 expression values under the null hypothesis; by default, this is normal with mean zero and standard deviation `sigma`, but mixtures of normals can be specified, see Details and Examples. `F1` the distribution of the log2 expression values under the alternative hypothesis; by default, this is an equal mixture of two normals with means `D` and -`D` and standard deviation `sigma`; mixture of normals are again possible, see Details and Examples. `paired` logical value indicating whether two distinct groups of observations or one group of paired observations are studied. `plot` logical value indicating whether the results should be plotted. `local.show` logical value indicating whether to show local or global false discovery rate (default: global). `alpha.show` logical value indicating whether to show the classical significance level for testing one hypothesis as a function of the cutoff level. `sensitivity.show` logical value indicating whether to show the classical sensitivity for testing one hypothesis as a function of the cutoff level. `nplot` number of points that are evaluated for the curves `xlim` the usual limits on the horizontal axis `ylim` the usual limits on the vertical axis `main` the main title of the plot `legend.show` logical value indicating whether to show a legend for the different types of curves in the plot. `...` the usual graphical parameters, passed to `plot`

## Details

This function plots the FDR as a function of the cutoff level when comparing the expression of multiple genes between two groups of subjects. We study a gene selection mechanism that declares all genes to be differentially expressed whose t-statistics have an absolute value greater than a specified cutoff value. The comparison is based on a two-sample t-statistic for equal variances, for either paired or unpaired observations.

The underlying model assumes that a proportion `p0` of genes are not differentially expressed between groups, and that 1-`p0` are. The logarithmized gene expression values are assumed to be generated by mixtures of normal distributions. Both null and alternative hypothesis are specified through the means of the respective mixture components; these means can be interpreted as average log2 fold changes in units of the standard deviation `sigma`.

Note that the model does not assume that all genes have the same standard deviation `sigma`, only that the mean log2 fold change for all regulated genes is proportional to their individual variability (standard deviation). `sigma` generally does not need to be specified explicitly and can be left at its default value of one, so that `D` can be interpreted straightforward as log2 fold change between groups.

The default null distribution of the log2 expression values is a single normal distribution with mean zero (and standard deviation `sigma`); the default alternative distribution is is an equal mixture of two normals with means `D` and -`D` (and again standard deviation `sigma`). However, general mixtures of normals can be specified for both null and alternative distribution through `F0` and `F1`, respectively: both are lists with two elements:

• `D` is the vector of means (i.e. log2 fold changes),

• `p` is the vector of mixing proportions for the means.

If present, `p` must be the same length as `D`; its elements do not need to be normalized, i.e. sum to one; if absent, equal mixing is assumed, see Examples. A wide (mixture) null hypothesis, or an empirical null hypothesis as outlined by Efron (2004), can be used if genes with log fold changes close to zero are thought to be of no biological interest, and are counted as effectively not regulated. Similarly, the alternative hypothesis can be any mixture of large and small effects, symmetric or non-symmetric, depending on the expected regulation patterns, see Examples.

As a consequence, both the null distribution of the t-statistics (for the unregulated genes) and their alternative distribution (for the regulated genes) are mixtures of (generally non-central) t-distributions, see `FDR`.

Sample size `n` and standard deviation `sigma` are atomic values, but multiple `p0` can be specified, resulting in multiple curves. Additionally, the usual significance level and sensitivity for a classical one-hypothesis can be displayed.

## Value

This function returns invisibly a data frame with `nplot` rows whose columns contain the information for the individual curves. The number of columns and their names will depend on the number and value of the `p0` specified, and whether alpha and sensitivity are displayed. Additionally, the returned data frame has an attribute `param`, which is a list with all the non-plotting arguments to the function.

## Note

Both the curve labels and the legend may be squashed if the plotting device is too small. Increasing the size of the device and re-plotting should improve readability.

## Author(s)

Y. Pawitan and A. Ploner

## References

Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A. (2005) False Discovery Rate, Sensitivity and Sample Size for Microarray Studies. Bioinformatics, 21, 3017-3024.

Efron, B. (2004) Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis. JASA, 99, 96-104.

`FDR`, `samplesize`, `EOC`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33``` ```# Default null and alternative distributions, assuming different proportions # of regulated genes TOC(p0=c(0.90, 0.95, 0.99), legend.show=TRUE) # The effect of sample size and effect size par(mfrow=c(2,2)) TOC(p0=c(0.90, 0.95, 0.99), n=5, D=1) TOC(p0=c(0.90, 0.95, 0.99), n=30, D=1) TOC(p0=c(0.90, 0.95, 0.99), n=5, D=2) TOC(p0=c(0.90, 0.95, 0.99), n=30, D=2) # A wide null distribution that allows to disregard genes of small effect # unspecified p means equal mixing proportions ret = TOC(F0=list(D=c(-0.25,0,0.25)), main="Wide F0") attr(ret,"param")\$F0 # the null hypothesis # An extended (and unsymmetric) alternative ret = TOC(F1=list(D=c(-2,-1,1), p=c(1,2,2)), p0=0.95, main="Unsymmetric F1") attr(ret,"param")\$F1 # F1\$p is normalized # Unequal sample sizes TOC(n1=10, n2=30) # Curves for a paired t-test TOC(paired=TRUE) # The output contains all the x- and y-coordinates ret = TOC(p0=c(0.90, 0.95, 0.99), main="Default settings") dim(ret) colnames(ret) ret[1:10,] # Additionally, the list of arguments that determine the experiment attr(ret,"param") ```