seqBoundariesGrid: Evaluate expected utility for parametric sequential stopping...

Description Usage Arguments Details Value Author(s) References See Also

View source: R/seqBoundariesGrid.r

Description

Estimate the expected utility for sequential boundaries parameterized by (b0,b1). Expected utility is estimated on a grid of (b0,b1) values based on a forward simulation output such as that generated by the function forwsimDiffExpr.

Usage

1
seqBoundariesGrid(b0, b1, forwsim, samplingCost, powmin = 0, f = "linear", ineq = "less")

Arguments

b0

Vector with b0 values. Expected utility is evaluated for a grid defined by all combinations of (b0,b1) values.

b1

Vector with b1 values.

forwsim

data.frame with forward simulation output, such as that returned by the function forwsimDiffExpr. It must have columns named simid, time, u, fdr, fnr, power and summary. See forwsimDiffExpr for details on the meaning of each column.

samplingCost

Cost of obtaining one more data batch, in terms of the number of new truly differentially expressed discoveries that would make it worthwhile to obtain one more data batch.

powmin

Constraint on power. Optimization chooses the optimal b0, b1 satisfying power>=powermin (if such b0,b1 exists).

f

Parametric form for the stopping boundary. Currently only 'linear' and 'invsqrt' are implemented. For 'linear', the boundary is b0+b1*time. For 'invsqrt', the boundary is b0+b1/sqrt(time), where time is the sample size measured as number of batches.

ineq

For ineq=='less' the trial stops when summary is below the stopping boundary. This is appropriate whenever summary measures the potential benefit of obtaining one more data batch. For ineq=='greater' the trial stops when summary is above the stopping boundary. This is approapriate whenever summary measures the potential costs of obtaining one more data batch.

Details

Intuitively, the goal is to stop collecting new data when the expected benefit of obtaining one more data batch is small, i.e. below a certain boundary. We consider two simple parametric forms for such a boundary (linear and inverse square root), which allows to easily evaluate the expected utility for each boundary within a grid of parameter values. The optimal boundary is defined by the parameter values achieving the largest expected utility, restricted to parameter values with an estimated power greater or equal than powmin. Here power is defined as the expected number of true discoveries divided by the expected number of differentially expressed entities.

The routine evaluates the expected utility, as well as expected FDR, FNR, power and sample size for each specified boundary, and also reports the optimal boundary.

Value

A list with two components:

opt

Vector with optimal stopping boundary (b), estimated expected utility (u), false discovery rate (fdr), false negative rate (fnr), power (power) and the expected sample size measured as the number of batches (time).

grid

data.frame with all evaluated boundaries (columns b0 and b1) and their respective estimated expected utility, false discovery rate, false negative rate, power and expected sample size (measured as the number of batches).

Author(s)

David Rossell.

References

Rossell D., Mueller P. Sequential sample sizes for high-throughput hypothesis testing experiments. http://sites.google.com/site/rosselldavid/home.

Rossell D. GaGa: a simple and flexible hierarchical model for microarray data analysis. Annals of Applied Statistics, 2009, 3, 1035-1051.

See Also

forwsimDiffExpr


gaga documentation built on Nov. 8, 2020, 5:49 p.m.