Description Usage Arguments Details Value Examples

Provides functions to fit a beta-mixture model to a set of p-values that has peaks at both zero and one, and to estimate false discovery rates.

1 2 3 | ```
fitMix3(datavec, forever=100, epsilon=0.001, print.level=0)
computeFDR(object, alpha)
computeCutoff(object, fdr)
``` |

`datavec` |
A numeric vector containing p-values. |

`forever` |
An integer; maximum number of iterations while fitting the mixture model. |

`epsilon` |
A real number; change in the log likelihood that should be used to terminate the model-fitting loop. |

`print.level` |
An integer; how much detail should |

`object` |
An object of the |

`alpha` |
A real number between 0 and 1; the cutoff on the nominal p-value where the FDR should be computed. |

`fdr` |
A real number beteen 0 and 1; the targeted FDR value. |

We have observed empirically that the set of p-values obtained when computing the Newman paired test statistic often has peaks both at zero (representing genes of interest) and at one (representing "boring" genes that change much less than expected). We attribute the latter phenomenon to the fact that we use locally smoothed instead of gene-by-gene estimates of the standard deviation; genes whose SD is increased by the smoothing process contribute to the boring peak near one.

To estimate p-values in this context, we fit a three-component beta mixture model, combining (1) a right-peaked distribution Beta(L,1), (2) a left-peaked dfistribution Beta(1,M), and (3) a uniform distribution. Specfically, we look for models of the form

*alpha*Beta(L,1) + beta*Beta(1, M) + gamma*Beta(1,1)*

.

Model-fitting uses an expectation-maximization (EM) algorithm. In
addition to the parameters `mle=c(L,M)`

and ```
psi=c(alpha,
beta, gamma)
```

, we introduce a matrix `Z`

of latent variables that
indicate which distribution each point is likely to arise
form. `Z`

has three columns (one for each mixture component) and
one row for each p-value; the entries in each row are nonegative and
sum to one. The M-step of the algorithm uses the `nlm`

optimization function to compute the maximum-likelihood `mle`

values given `psi`

and `Z`

. The E-step first updates
`psi`

from the `Z`

-matrix, and then updates the values of
`Z`

based on the current `mle`

.

We are able to use the mixture distribution to compute the relationship between a cutoff on the nominal p-values and the false discovery rate (FDR).

The model-fitting function, `fitMix3`

, returns an object of the
`MixOf3Beta`

class.

The `computeFDR`

function returns a real number in [0,1], the
false discovery rate assiociated with the nominal cutoff.

The `computeCutoff`

function returns a real number in [0,1], the
cutoff required to achieve the desired FDR.

1 2 3 4 5 6 7 8 9 10 | ```
set.seed(98765)
ds <- c(rbeta(3000, 20, 1),
rbeta(1000, 1, 40),
runif(6000))
fit <- fitMix3(ds)
computeFDR(fit, 0.01)
computeCutoff(fit, 0.01)
computeFDR(fit, 0.0016438)
computeCutoff(fit, 0.05)
computeFDR(fit, 0.00702114)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.