04-fitMix3: Compute FDR from Three-Component Beta Mixture

fitMix3R Documentation

Compute FDR from Three-Component Beta Mixture

Description

Provides functions to fit a beta-mixture model to a set of p-values that has peaks at both zero and one, and to estimate false discovery rates.

Usage

fitMix3(datavec, forever=100, epsilon=0.001, relative = 0.001, print.level=0)
computeFDR(object, alpha)
computeCutoff(object, fdr)

Arguments

datavec

A numeric vector containing p-values.

forever

An integer; maximum number of iterations while fitting the mixture model.

epsilon

A real number; change in the log likelihood that should be used to terminate the model-fitting loop.

relative

A real number; change in the relative log likelihood that should be used to terminate the model-fitting loop.

print.level

An integer; how much detail should nlm print while fitting the model.

object

An object of the MixOf3Beta class.

alpha

A real number between 0 and 1; the cutoff on the nominal p-value where the FDR should be computed.

fdr

A real number beteen 0 and 1; the targeted FDR value.

Details

We have observed empirically that the set of p-values obtained when computing the Newman paired test statistic often has peaks both at zero (representing genes of interest) and at one (representing "boring" genes that change much less than expected). We attribute the latter phenomenon to the fact that we use locally smoothed instead of gene-by-gene estimates of the standard deviation; genes whose SD is increased by the smoothing process contribute to the boring peak near one.

To estimate p-values in this context, we fit a three-component beta mixture model, combining (1) a right-peaked distribution Beta(L,1), (2) a left-peaked dfistribution Beta(1,M), and (3) a uniform distribution. Specfically, we look for models of the form

alpha*Beta(L,1) + beta*Beta(1, M) + gamma*Beta(1,1)

.

Model-fitting uses an expectation-maximization (EM) algorithm. In addition to the parameters mle=c(L,M) and psi=c(alpha, beta, gamma), we introduce a matrix Z of latent variables that indicate which distribution each point is likely to arise form. Z has three columns (one for each mixture component) and one row for each p-value; the entries in each row are nonegative and sum to one. The M-step of the algorithm uses the nlm optimization function to compute the maximum-likelihood mle values given psi and Z. The E-step first updates psi from the Z-matrix, and then updates the values of Z based on the current mle.

We are able to use the mixture distribution to compute the relationship between a cutoff on the nominal p-values and the false discovery rate (FDR).

Value

The model-fitting function, fitMix3, returns an object of the MixOf3Beta class.

The computeFDR function returns a real number in [0,1], the false discovery rate assiociated with the nominal cutoff.

The computeCutoff function returns a real number in [0,1], the cutoff required to achieve the desired FDR.

Examples

set.seed(98765)
ds <- c(rbeta(3000, 20, 1),
        rbeta(1000, 1, 40),
        runif(6000))
fit <- fitMix3(ds)
computeFDR(fit, 0.01)
computeCutoff(fit, 0.01)
computeFDR(fit, 0.0016438)
computeCutoff(fit, 0.05)
computeFDR(fit, 0.00702114)

NewmanOmics documentation built on May 29, 2024, 10:23 a.m.