# overstallwoods: Functions implementing the examples of Overstall & Woods... In acebayes: Optimal Bayesian Experimental Design using the ACE Algorithm

## Description

This suite of functions implement the examples in Overstall & Woods (2017).

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48``` ```######## Compartmental model ################################# utilcomp18bad(d, B) optdescomp18bad(type = "ACE") utilcomp15bad(d, B) optdescomp15bad() utilcomp15sig(d, B) optdescomp15sig() utilcomp15sigDRS(d, B) optdescomp15sigDRS() ######## Logistic regression model ########################### utillrbad(d, B) optdeslrbad(n, type = "ACE") utillrsig(d, B) inideslrsig(n, rep) optdeslrsig(n) utilhlrbad(d, B) optdeshlrbad(n) utilhlrsig(d, B) inideshlrsig(n, rep) optdeshlrsig(n) utillrbaa(d, B) optdeslrbaa(n) utillrnsel(d, B) inideslrnsel(n, rep) optdeslrnsel(n) optdeshlrbaa(n) utilhlrbaa(d, B) utilhlrnsel(d, B) inideshlrnsel(n, rep) optdeshlrnsel(n) ######## Beetle mortality experiment ######################### utilbeetle(d, B) optdesbeetle(n) ######## Linear model ######################################## utillinmod(d, B) optdeslinmod(n, type = "ACE") ############################################################## ```

## Arguments

 `d` An `n` by `k` matrix specifying the design matrix, where `n` and `k` denote the number of runs and factors, respectively. See Details and Value for the values that `n` and `k` can take for each of the examples. Each element of `d` is scaled to the interval [-1,1]. `n` The number of runs ins the experiment. `rep` A scalar integer in the set {1,...,20} specifying the initial design. `B` A scalar integer specifying the Monte Carlo sample size. `type` An optional character argument specifying which design to return. For `optdeslrbad`, possible values are `c("ACE","Gotwalt","Atkinson")`. If `"ACE"` (the default) then the design found by the ACE algorithm will be returned. If `"Gotwalt"` then the design published in Gotwalt et al (2009) is returned. If `"Atkinson"` then the design found by Atkinson et al (1993) is returned. For `optdeslinmod`, possible values are `c("ACE","BoxDraper")`. If `"ACE"` (the default) then the design found by the ACE algorithm will be returned. If `"BoxDraper"` then the true optimal design, as found by Box & Draper (1971), will be returned.

## Details

This section provides details on the examples considered and the functions implemented in `acebayes`.

Compartmental model

Compartmental models are used in Pharmokinetics to study how materials flow through an organism. A drug is administered to an individual or animal and then the amount present at a certain body location is measured at a set of `n` pre-determined sampling times (in hours). There is one design variable: sampling time. Therefore the design matrix `d` is an `n` by 1 matrix with elements controlling the `n` sampling times, i.e. the number of factors is `k=1`.

In Overstall & Woods (2017), two different compartmental model examples are considered. The first (in the Supplementary Material) comes from Atkinson et al (1993) and Gotwalt et al (2009) where there are `n = 18` sampling times and interest lies in finding a Bayesian D-optimal design. The functions whose name includes `"comp18"` refer to this example.

The second example (in Section 3.2) comes from Ryan et al (2014), where there are `n = 15` sampling times and the ultimate interest lies in finding an optimal design under the Shannon information gain utility. Also considered is the Bayesian D-optimal design. The functions whose name includes `"comp15"` refer to this example. Note that Ryan et al (2014) used a dimension reduction scheme (DRS) to find optimal designs. The function whose name is suffixed by `"DRS"` refer to this situation.

Logistic regression model

In Section 3.3 of Overstall & Woods (2017), a first-order logistic regression model in `k = 4` factors and `n` runs is considered. Woods et al (2006) and Gotwalt et al (2009) considered generating Bayesian D-optimal designs for `n = 16` and `n = 48`. Overstall & Woods (2017) extended this example by considering Bayesian A-optimal, Shannon information gain (SIG) and negative squared error loss (NSEL) utility functions, a range of number of runs from 6 to 48, and "random effects" to form a hierarchical logistic regression model.

Beetle mortality experiment

Overstall & Woods (2017, Section 3.4) considers generating an optimal design for a follow-up experiment. The original design and data (Bliss, 1935) involves administering different doses of poison to N = 8 groups of beetles. The number of beetles that die in each group are recorded. Six different models are considered formed from the combination of three link functions and two linear predictors (following the analysis of O'Hagan & Forster, 2004). Interest lies in the quantity known as lethal dose 50 (LD50) which is the dose required to kill 50% of the beetles and is a function of the model parameters for a given model. Consider finding an optimal design for estimating LD50 under the negative squared error loss (NSEL) function for `n` new doses of poison (i.e. `k = 1` factor). The prior distribution is equivalent to the posterior distribution arising from the original data and includes model uncertainty.

Linear model

In the supplementary material, Overstall & Woods (2017) considers finding D-optimal designs for a second-order (i.e. `k = 2`) response surface in `n=6,7,8,9` runs. Note that the D-optimal design is equivalent to the optimal design under Shannon information gain and a non-informative prior distribution.

The expected utility function in this case is available in closed form, i.e. it does not require approximation. Box & Draper (1971) found optimal designs analytically for the number of runs considered here. Overstall & Woods (2017) use this example to demonstrate the efficacy of the ACE algorithm.

## Value

Compartmental model

For the example in the Supplementary Material;

• The function `utilcomp18bad` will return a vector of length `B` where each element is the value of the Bayesian D-optimal utility function (i.e. the log determinant of the Fisher information matrix) evaluated at a sample of size `B` generated from the prior distribution of the model parameters.

• The function `optdescomp18bad` will return an 18 by 1 matrix giving the optimal design (specified by the argument `type`). The elements will be scaled to be in the interval [-1, 1], i.e. a -1 corresponds to a sampling times of 0 hours, and 1 corresponds to a time of 24 hours.

For the example in Section 3.2;

• The function `utilcomp15bad` will return a vector of length `B` where each element is the value of the Bayesian D-optimal utility function (i.e. the log determinant of the Fisher information matrix) evaluated at a sample of size `B` generated from the prior distribution of the model parameters.

• The function `optdescomp15bad` will return an 15 by 1 matrix giving the optimal design (found using ACE) under Bayesian D-optimality. The elements will be scaled to be in the interval [-1, 1], i.e. a -1 corresponds to a sampling times of 0 hours, and 1 corresponds to a time of 24 hours.

• The function `utilcomp15sig` will return a vector of length `B` where each element is the value of the SIG utility function evaluated at a sample of size `B` generated from the joint distribution of model parameters and unobserved responses.

• The function `optdescomp15sig` will return an 18 by 1 matrix giving the optimal design (found using ACE) under the SIG utility. The elements will be scaled to be in the interval [-1, 1], i.e. a -1 corresponds to a sampling times of 0 hours, and 1 corresponds to a time of 24 hours.

• The function `utilcomp15sigDRS` will return a vector of length `B` where each element is the value of the SIG utility function (where a DRS has been used) evaluated at a sample of size `B` generated from the joint distribution of model parameters and unobserved responses. Here the Beta DRS (see Overstall & Woods, 2017) has been used so `d` should be a 2 by 1 matrix containing the positive beta parameters.

• The function `optdescomp15sigDRS` will return a 2 by 1 matrix giving the optimal design (found using ACE) under the SIG utility, where a DRS has been used. The elements correspond to the parameters of a beta distribution.

Logistic regression model

A function whose name includes `"lr"` refers to standard logistic regression, whereas `"hlr"` refers to hierarchical logistic regression. Under standard logistic regression the possible values for the argument `n` can be any even integer between 6 and 48. For hierarchical logistic regression, `n` can be any integer divisible by 6 between 12 and 48. The function name also indicates the utility function:

• `"bad"` Bayesian D-optimal

• `"baa"` Bayesian A-optimal

• `"sig"` Shannon information gain

• `"nsel"` Negative squared error loss

The functions prefixed by `"util"` will return a vector of length `B` where each element is the utility function evaluated at a sample generated from the prior distribution of model parameters (for Bayesian D- and A-optimality) or the joint distribution of model parameters and unobserved responses (for SIG and NSEL).

The functions prefixed by `"optdes"` will return an `n` by `k = 4` matrix giving the optimal design found by ACE. The designs given by this function are those reported on in Overstall & Woods (2017). The function `optdeslrbad` will also return designs (for `n = 16, 48`) found by Woods et al (2006) and Gotwalt et al (2009) by specifying the `type` argument appropriately.

The functions prefixed by `"inides"` will return an `n` by `k = 4` matrix giving an initial design for ACE to find the optimal designs under the SIG and NSEL utility functions. These are 20 designs found using ACE under approximations to the Bayesian A- and D-optimal utility functions, respectively. The argument `rep` specifies which of these 20 designs to use.

Beetle mortality experiment

The function `utilbeetle` will return a vector of length `B` where each element is the value of the utility function for a sample generated from the joint distribution of the model parameters, model and unobserved responses.

The function `optdesbeetle` will return an `n` by 1 matrix giving the optimal design under the NSEL utility function (found using ACE) for estimating the LD50. The elements will be scaled to be in the interval [-1, 1], where -1 corresponds to a dose of 1.6907, 0 to a dose of 1.7873 and 1 to a dose of 1.8839. The designs given by this function are those reported in Overstall & Woods (2017) for `n` = 1, ..., 10.

Linear model

The function `utillinmod` will return a vector of length `B` where each element is a realisation of a stochastic approximation to the expected utility.

The function `optdeslinmod` will return an `n` by 2 matrix giving the D-optimal design (specified by the argument `type`). If `type = "ACE"`, the designs returned by this function are those found using the ACE algorithm and are reported in the Supplementary Material of Overstall & Woods (2017), and if `type = "BoxDraper"`, the designs returned are the exact D-optimal designs.

## Author(s)

Antony M. Overstall A.M.Overstall@soton.ac.uk, David C. Woods, Maria Adamou & Damianos Michaelides

## References

Atkinson, A., Chaloner, K., Herzberg, A., & Juritz, J. (1993). Experimental Designs for Properties of a Compartmental Model. Biometrics, 49, 325-337.

Bliss, C. (1935). The calculation of the dosage-mortality curve. Annals of Applied Biology, 22, 134-167.

Box, M. & Draper, N. (1971). Factorial designs, the |F^T F| criterion and some related matters. Techometrics, 13, 731-742.

Gotwalt, C., Jones, B. & Steinberg, D. (2009). Fast Computation of Designs Robust to Parameter Uncertainty for Nonlinear Settings. Technometrics, 51, 88-95.

O'Hagan, A, & Forster, J.J. (2004). Kendall's Advanced Theory of Statistics, Volume 2B: Bayesian Inference. 2nd edition. John Wiley & Sons.

Overstall, A.M. & Woods, D.C. (2017). Bayesian design of experiments using approximate coordinate exchange. Technometrics, 59, 458-470.

Ryan, E., Drovandi, C., Thompson, M., Pettitt, A. (2014). Towards Bayesian experimental design for nonlinear models that require a large number of sampling times. Computational Statistics and Data Analysis, 70, 45-60.

Woods, D.C., Lewis, S., Eccleston, J., Russell, K. (2006). Designs for Generalized Linear Models With Several Variables and Model Uncertainty. Technometrics, 48, 284-292.

`ace`, `pace`.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129``` ```######## Compartmental model ################################# set.seed(1) ## Set seed for reproducibility d<-optimumLHS(n = 18, k = 1) * 2 - 1 ## Generate an 18-run design. u<-utilcomp18bad(d = d, B = 20000) ## Calculate the D-optimal utility function for a ## sample of size 20000. u[1:5] ## Look at first 5 elements. #[1] 14.283473 10.525390 4.126233 7.061498 12.793245 d0<-optdescomp18bad( ) u0<-utilcomp18bad(d = d0, B = 20000) ## Optimal design found by ACE and calculate the D-optimal ## utility function for a sample of size 20000. u0[1:5] ## Look at first 5 elements. #[1] 15.04721 15.37141 16.84287 14.06750 14.01523 mean(u) mean(u0) ## Calculate expected Bayesian D-optimal utility. d<-matrix(runif(2), ncol = 1) ## Generate two beta parameters. u<-utilcomp15sigDRS(d = d, B = 5) u ## Calculate the SIG utility function for a ## sample of size 5. #[1] 17.652044 4.878998 19.919559 22.017760 5.600473 ######## Logistic regression model ########################### set.seed(1) ## Set seed for reproducibility d<-optimumLHS(n = 16,k = 4) * 2 - 1 ## Generate an 16-run design. u<-utillrbad(d = d, B = 20000) ## Calculate the D-optimal utility function for a ## sample of size 20000. u[1:5] ## Look at first 5 elements. #[1] -11.630683 -5.748912 -9.554413 -10.150132 -7.940938 d0<-optdeslrbad(16) u0<-utillrbad(d = d0, B = 20000) ## Optimal design found by ACE and calculate the D-optimal ## utility function for a sample of size 20000. u0[1:5] ## Look at first 5 elements. #[1] -4.644116 -2.411431 -4.999891 -2.906558 -2.282687 mean(u) mean(u0) ## Calculate expected Bayesian D-optimal utility. #[1] -9.38253 #[1] -2.992012 ######## Beetle mortality experiment ######################### set.seed(1) ## Set seed for reproducibility d<-optimumLHS(n = 10, k = 1)*2-1 ## Generate a design of 10 doses with elements in [-1, 1] utilbeetle(d = d, B = 5) ## Calculate the utility function for a sample of size 5. #-4.720491e-06 -1.198955e-05 -1.681380e-05 -3.123498e-06 -1.412722e-05 d0<-optdesbeetle(10) d0 ## Print out optimal design from Overstall & Woods (2017) for 10 doses 0.5*( d0 + 1)*( 1.8839 - 1.6907 ) + 1.6907 ## On original scale. # [,1] # [1,] 1.769957 # [2,] 1.769520 # [3,] 1.768641 # [4,] 1.777851 # [5,] 1.768641 # [6,] 1.769520 # [7,] 1.777851 # [8,] 1.765997 # [9,] 1.768641 #[10,] 1.768641 ######## Linear model ######################################## set.seed(1) ## Set seed for reproducibility d<-cbind(rep(c( -1, 0, 1), each = 3), rep(c( -1, 0, 1), 3)) ## Create a 9-run design which is the true D-optimal design utillinmod(d = d, B = 5) ## Calculate the approximation to the true expected D-optimal utility ## function for a sample of size 5. #[1] 7.926878 8.736976 7.717704 10.148613 8.882840 d0<-optdeslinmod(9) ## Optimal D-optimal design found using ACE X<-cbind(1, d, d^2, d[,1] * d[,2]) X0<-cbind(1, d0, d0^2, d0[,1] * d0[,2]) ## Calculate model matrices under both designs detX<-determinant( t(X) %*% X)\$modulus[1] detX0<-determinant( t(X0) %*% X0)\$modulus[1] ## Calculate true expected D-optimal utility function for both designs 100 * exp( 0.2 * ( detX0 - detX )) ## Calculate D-efficiency of ACE design. # 99.93107 ```