# acenlm: Approximate Coordinate Exchange (ACE) Algorithm for... In acebayes: Optimal Bayesian Experimental Design using the ACE Algorithm

## Description

Functions implementing the approximate coordinate exchange algorithm (Overstall & Woods, 2017) for finding optimal Bayesian designs for non-linear regression models.

## Usage

 ```1 2 3 4 5 6 7``` ```acenlm(formula, start.d, prior, B, criterion = c("D", "A", "E", "SIG", "NSEL"), method = c("quadrature", "MC"), Q = 20, N1 = 20, N2 = 100, lower = -1, upper = 1, progress = FALSE, limits = NULL) pacenlm(formula, start.d, prior, B, criterion = c("D", "A", "E", "SIG", "NSEL"), method = c("quadrature", "MC"), Q = 20, N1 = 20, N2 = 100, lower = -1, upper = 1, limits = NULL, mc.cores = 1, n.assess = 20) ```

## Arguments

 `formula` An object of class `"formula"`: a symbolic description of the model. The terms should correspond to the column names of the argument `start.d` and the argument `prior`. `start.d` For `aceglm`, an `n` by `k` matrix, with column names used by the argument `formula`, specifying the initial design for the ACE algorithm. For `paceglm`, a list with each element, an `n` by `k` matrix, with column names used by the argument `formula`, specifying the initial design for each repetition of the ACE algorithm. `prior` An argument specifying the prior distribution. For `method = "MC"`, a function with one argument: `B`; a scalar integer. This function should return a `B` by `p` matrix (or `p+1` for `criterion = "SIG"` or `criterion = "NSEL"`), with `p` the number of model parameters, containing a random sample from the prior distribution of the parameters. The value of `p` should correspond to the number of terms specified by the `formula` argument. The column names must match the names of parameters in the `formula` argument. For `criterion="SIG"` or `criterion="NSEL"`, an extra column (named `sig2`) should contain a sample from the prior distribution of the error variance. For `method = "quadrature"`, a list specifying a normal or uniform prior for the model parameters. For a normal prior distribution, the list should have named entries `mu` and `sigma2` specifying the prior mean and variance-covariance matrix. The prior mean may be specified as a scalar, which will then be replicated to form an vector of common prior means, or a vector of length `p`. The prior variance-covariance matrix may be specified as either a scalar common variance or a vector of length `p` of variances (for independent prior distributions) or as a `p` by `p` matrix. The names attribute of `mu` must match the names of the parameters in the `formula` argument. For a uniform prior distribution, the list should have a named entry `support`, a `2` by `p` matrix with each column giving the lower and upper limits of the support of the independent continuous uniform distribution for the corresponding parameter. The column names of `support` must match the names of parameters in the `formula` argument. `B` An optional argument for controlling the approximation to the expected utility. It should be a vector of length two. For `criterion = "MC"`, it specifies the size of the Monte Carlo samples, generated from the joint distribution of unknown quantities. The first sample size, `B`, gives the sample size to use in the comparison procedures, and the second sample size, `B`, gives the sample size to use for the evaluations of Monte Carlo integration that are used to fit the Gaussian process emulator. If left unspecified, the default value is `c(20000,1000)`. For `criterion = "quadrature"`, it specifies the tuning parameters (numbers of radial abscissas and random rotations) for the implemented quadrature method; see Details for more information. If left unspecified, the default value is `c(2, 8)`. `criterion` An optional character argument specifying the utility function. There are currently five utility functions implemented consisting of pseudo-Bayesian D-optimality (`criterion = "D"`); pseudo-Bayesian A-optimality (`criterion = "A"`); pseudo-Bayesian E-optimality (`criterion = "E"`); Shannon information gain (`criterion = "SIG"`); negative squared error loss (`criterion = "NSEL"`). The default value is `"D"` denoting pseudo-Bayesian D-optimality. See Details for more information. `method` An optional character argument specifying the method of approximating the expected utility function. Current choices are `method = "quadrature"` for a deterministic quadrature approximation and `method = "MC"` for a stochastic Monte Carlo approximation. The first of these choices is only available when the argument `criterion = "A"`, `"D"` or `"E"`. The second choice is available for all possible values of the argument `criterion`. If left unspecified, the argument defaults to `"quadrature"` for `criterion = "A"`, `"D"` or `"E"` and to `"MC"` otherwise. See Details for more information. `Q` An integer specifying the number of evaluations of the approximate expected utility that are used to fit the Gaussian process emulator. The default value is `20`. `N1` An integer specifying the number of iterations of Phase I of the ACE algorithm (the coordinate exchange phase). The default value is `20`. `N2` An integer specifying the number of iterations of Phase II of the ACE algorithm (the point exchange phase). The default value is `100`. `lower` An argument specifying the design space. This argument can either be a scalar or a matrix of the same dimension as the argument `start.d` which specifies the lower limits of all coordinates of the design space. The default value is -1. `upper` An argument specifying the design space. This argument can either be a scalar or a matrix of the same dimension as the argument `start.d` which specifies the upper limits of all coordinates of the design space. The default value is 1. `progress` A logical argument indicating whether the iteration number should be printed. The default value is `FALSE`. `limits` An argument specifying the grid over which to maximise the Gaussian process emulator for the expected utility function. It should be a function with three arguments: `i`, `j` and `d` which generates a one-dimensional grid for the `ij`th coordinate of the design when the current design is `d`. The default value is NULL which generates values uniformly on the interval `(lower[i,j],upper[i,j])` or `(lower,upper)` depending on whether the arguments `lower` and `upper` are matrices or scalars, respectively. `mc.cores` The number of cores to use, i.e. at most how many child processes will be run simultaneously. Must be at least one (the default), and parallelisation requires at least two cores. See `mclapply` for more information and warnings for `mc.cores` > 1. `n.assess` If `method = "MC"`, the approximate expected utility for the design from each repetition of the ACE algorithm will be calculated `n.assess` times. The terminal design returned will be the design with the largest mean approximate expected utility calculated over the `n.assess` approximations.

## Details

The `acenlm` function implements the ACE algorithm to find designs for general classes of nonlinear regression models with identically and independently normally distributed errors meaning the user does not have to write their own utility function.

Two utility functions are implemented.

1. Shannon information gain (SIG)

The utility function is

U^SIG(d) = f(θ|y,d) - f(θ),

where f(θ|y,d) and f(θ) denote the posterior and prior densities of the parameters θ, respectively.

2. Negative squared error loss (NSEL)

The utility function is

u^NSEL(d) = - (θ - E(θ |y,d))^T(θ - E(θ |y,d)),

where E(θ | y,d) denotes the posterior mean of θ.

In both cases the utility function is not available in closed form due to the analytical intractability of either the posterior distribution (for SIG) or the posterior mean (for NSEL). Sampling-based Monte Carlo or importance sampling approximations will be employed. This was the original approach used by Overstall & Woods (2017).

A normal approximation to the posterior can be taken leading to the approximation by some scalar function of the Fisher information matrix, I (θ;d), which only depends on θ (Chaloner & Verdinelli, 1995). In the case of SIG, the approximate utility is given by

u^D(d) = log | I(θ;d)|,

and the resulting design is typically called pseudo-Bayesian D-optimal. For NSEL, the approximate utility is given by

u^A(d) = - tr (I(θ;d)^(-1))

with the resulting design termed pseudo-Bayesian A-optimal. These designs are often used under the frequentist approach to optimal experimental design and so to complete the usual set, the following utility for finding a pseudo-Bayesian E-optimal design is also implemented:

U^E(d) = min(e(I(θ;d))),

where e() denotes the function that calculates the eigenvalues of its argument.

The expected utilities can be approximated using Monte Carlo methods (`method = "MC"` for all criteria) or using a deterministic quadrature method (`method = "quadrature"`, implemented for the D, A and E criteria). The former approach approximates the expected utility via sampling from the prior. The latter approach uses a radial-spherical integration rule (Monahan and Genz, 1997) and `B` specifies the number, n_r, of radial abscissas and `B` specifies the number, n_q, of random rotations. Larger values of n_r will produce more accurate, but also more computationally expensive, approximations. See Gotwalt et al. (2009) for further details.

Similar to all coordinate exchange algorithms, ACE should be repeated from different initial designs. The function `pacenlm` will implement this where the initial designs are given by a list via the argument `start.d`. On the completion of the repetitions of ACE, `pacenlm` will approximate the expected utility for all final designs and return the design (the terminal design) with the largest approximate expected utility.

For more details on the ACE algorithm, see Overstall & Woods (2017).

## Value

The function will return an object of class `"ace"` (for `acenlm`) or `"pace"` (for `pacenlm`) which is a list with the following components:

 `utility` The utility function resulting from the choice of arguments. `start.d` The argument `start.d`. `phase1.d` The design found from Phase I of the ACE algorithm. `phase2.d` The design found from Phase II of the ACE algorithm. `phase1.trace` A vector containing the evaluations of the approximate expected utility of the current design at each stage of Phase I of the ACE algorithm. This can be used to assess convergence. `phase2.trace` A vector containing the evaluations of the approximate expected utility of the current design at each stage of Phase II of the ACE algorithm. This can be used to assess convergence. `B` The argument `B`. `Q` The argument `Q`. `N1` The argument `N1`. `N2` The argument `N2`. `glm` This will be `FALSE`. `nlm` If the object is a result of a direct call to `acenlm` then this is `TRUE`. `criterion` If the object is a result of a direct call to `acenlm` then this is the argument `criterion`. `method` If the object is a result of a direct call to `acenlm` then this is the argument `method`. `prior` If the object is a result of a direct call to `aceglm` then this is the argument `prior`. `formula` If the object is a result of a direct call to `acenlm` then this is the argument `formula`. `time` Computational time (in seconds) to run the ACE algorithm. `binary` The argument `binary`. Will be `FALSE` for the utility functions currently implemented. `d` The terminal design (`pacenlm` only). `eval` If `deterministic = "MC"`, a vector containing `n.assess` approximations to the expected utility for the terminal design (`pacenlm` only). If `deterministic = "quadrature"`, a scalar giving the approximate expected utility for the terminal design (`pacenlm` only). `final.d` A list of the same length as the argument `start.d`, where each element is the final design (i.e. `phase2.d`) for each repetition of the ACE algorithm (`pacenlm` only). `besti` A scalar indicating which repetition of the ACE algorithm resulted in the terminal design (`pacenlm` only).

## Note

These are wrapper functions for `ace` and `pace`.

## Author(s)

Antony M. Overstall A.M.Overstall@soton.ac.uk, David C. Woods, Maria Adamou & Damianos Michaelides

## References

Chaloner, K. & Verdinelli, I. (1995). Bayesian experimental design: a review. Statistical Science, 10, 273-304.

Gotwalt, C. M., Jones, B. A. & Steinberg, D. M. (2009). Fast computation of designs robust to parameter uncertainty for nonlinear settings. Technometrics, 51, 88-95.

Meyer, R. & Nachtsheim, C. (1995). The coordinate exchange algorithm for constructing exact optimal experimental designs. Technometrics, 37, 60-69.

Monahan, J. and Genz, A. (1997). Spherical-radial integration rules for Bayesian computation,” Journal of the American Statistical Association, 92, 664–674.

Overstall, A.M. & Woods, D.C. (2017). Bayesian design of experiments using approximate coordinate exchange. Technometrics, 59, 458-470.

`ace`, `aceglm`, `pace`, `paceglm`.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94``` ```## This example uses aceglm to find a Bayesian D-optimal design for a ## compartmental model with 6 runs 1 factor. The priors are ## those used by Overstall & Woods (2017). The design space for each ## coordinate is [0, 24] hours. set.seed(1) ## Set seed for reproducibility. n<-6 ## Specify the sample size (number of runs). start.d<-matrix(24 * randomLHS(n = n, k = 1), nrow = n, ncol = 1, dimnames = list(as.character(1:n), c("t"))) ## Generate an initial design of appropriate dimension. The initial design is a ## Latin hypercube sample. low<-c(0.01884, 0.298, 21.8) upp<-c(0.09884, 8.298, 21.8) ## Lower and upper limits of the support of the uniform prior distributions. Note that the prior ## for the third element is a point mass. prior<-function(B){ out<-cbind(runif(n = B, min = low, max = upp), runif(n = B, min = low,max = upp), runif(n = B, min = low, max = upp)) colnames(out)<-c("a", "b", "c") out} ## Create a function which specifies the prior. This function will return a ## B by 3 matrix where each row gives a value generated from the prior ## distribution for the model parameters. example1<-acenlm(formula = ~ c*(exp( - a * t) - exp( - b * t)), start.d = start.d, prior = prior, N1 = 1, N2 = 0, B = c(1000, 1000), lower = 0, upper = 24, method = "MC") ## Call the acenlm function which implements the ACE algorithm requesting ## only one iteration of Phase I and zero iterations of Phase II. The Monte ## Carlo sample size for the comparison procedure (B) is set to 1000. example1 ## Print out a short summary. #Non Linear Model #Criterion = Bayesian D-optimality #Formula: ~c * (exp(-a * t) - exp(-b * t)) #Method: MC # #B: 1000 1000 # #Number of runs = 6 # #Number of factors = 1 # #Number of Phase I iterations = 1 # #Number of Phase II iterations = 0 # #Computer time = 00:00:00 example1\$phase2.d ## Look at the final design. # t #1 19.7787011 #2 2.6431912 #3 0.2356938 #4 8.2471451 #5 1.4742319 #6 12.7062270 prior2 <- list(support = cbind(rbind(low, upp))) colnames(prior2\$support) <- c("a", "b", "c") example2 <- acenlm(formula = ~ c * (exp( - a * t) - exp( - b *t)), start.d = start.d, prior = prior2, lower = 0, upper = 24, N1 = 1, N2 = 0 ) ## Call the acenlm function with the default method of "quadrature" example2\$phase2.d ## Final design # t #1 0.5167335 #2 2.3194434 #3 1.5365409 #4 8.2471451 #5 21.9402670 #6 12.7062270 utility <- utilitynlm(formula = ~c * (exp( - a * t) - exp( - b *t)), prior = prior, desvars = "t", method = "MC" )\$utility ## create a utility function to compare designs mean(utility(example1\$phase1.d, B = 20000)) # 12.13773 mean(utility(example2\$phase1.d, B = 20000)) # 11.19745 ## Compare the two designs using the Monte Carlo approximation ```

### Example output

```Loading required package: lhs
Non Linear Model
Criterion = Bayesian D-optimality
Formula: ~c * (exp(-a * t) - exp(-b * t))
Method:  MC

B:  1000 1000

Number of runs = 6

Number of factors = 1

Number of Phase I iterations = 1

Number of Phase II iterations = 0

Computer time = 00:00:00
t
1 19.7787011
2  2.6431912
3  0.2356938
4  8.2471451
5  1.4742319
6 12.7062270
t
1  0.5167335
2  2.3194434
3  1.5365409
4  8.2471451
5 21.9402670
6 12.7062270
 12.13773
 11.19745
```

acebayes documentation built on Oct. 23, 2020, 8:27 p.m.