# utilities: Approximate expected utility function for generalised linear... In acebayes: Optimal Bayesian Experimental Design using the ACE Algorithm

## Description

Generates an approximate utility function for generalised linear models and non-linear regression models.

## Usage

 ```1 2 3 4 5 6``` ```utilityglm(formula, family, prior, criterion = c("D", "A", "E", "SIG", "NSEL", "SIG-Norm", "NSEL-Norm"), method = c("quadrature", "MC"), nrq) utilitynlm(formula, prior, desvars, criterion = c("D", "A", "E", "SIG", "NSEL"), method = c("quadrature", "MC"), nrq) ```

## Arguments

 `formula` An argument providing a symbolic description of the model. For `utilityglm`, an object of class `"formula"`: a symbolic description of the model. For `utilitynlm`, an object of class `"formula"`: a symbolic description of the model. The terms should correspond to the argument `prior` `family` For `utilityglm`, a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See `family` for details of family functions.) `prior` An argument specifying the prior distribution. For `method = "MC"`, a function with one argument: `B`; a scalar integer. This function should return a `B` by `p` matrix (`p+1` for `criterion = "SIG"` or `criterion = "NSEL"`), with `p` the number of model parameters, containing a random sample from the prior distribution of the parameters. The value of `p` should correspond to the number of terms specified by the `formula` argument. For `utilitynlm`, the column names must match the names of parameters in the `formula` argument. For `utilitynlm`, if `criterion="SIG"`, `criterion="NSEL"`, `criterion="SIG-Norm"` or `criterion="NSEL-Norm"` then an extra column called `sig2` should be included with a sample from the error variance. For `method = "quadrature"`, a list specifying a normal or uniform prior for the model parameters. For a normal prior distribution, the list should have named entries `mu` and `sigma2` specifying the prior mean and variance-covariance matrix. The prior mean may be specified as a scalar, which will then be replicated to form an vector of common prior means, or a vector of length `p`. The prior variance-covariance matrix may be specified as either a scalar common variance or a vector of length `p` of variances (for independent prior distributions) or as a `p` by `p` matrix. For `utilitynlm`, the names attribute of `mu` must match the names of the parameters in the `formula` argument. For a uniform prior distribution, the list should have a named entry `support`, a `2` by `p` matrix with each column giving the lower and upper limits of the support of the independent continuous uniform distribution for the corresponding parameter. For `utilitynlm`, the column names of `support` must match the names of parameters in the `formula` argument. `desvars` For `utilitynlm`, a character vector listing the design variables that appear in the argument `formula`. `criterion` An optional character argument specifying the utility function. There are currently seven utility functions implemented as follows: pseudo-Bayesian D-optimality (`criterion = "D"`); pseudo-Bayesian A-optimality (`criterion = "A"`); pseudo-Bayesian E-optimality (`criterion = "E"`). Shannon information gain with Monte Carlo (MC) approximation to marginal likelihood (`criterion = "SIG"`); Shannon information gain with normal-based Laplace approximation to marginal likelihood (`criterion = "SIG-Norm"`, only for `utilityglm`)); negative squared error loss with importance sampling approximation to posterior mean (`criterion = "NSEL"`); negative squared error loss with normal-based approximation to posterior mean (`criterion = "NSEL-Norm"`, only for `utilityglm`)) ; The default value is `"D"` denoting pseudo-Bayesian D-optimality. See Details for more information. `method` An optional character argument specifying the method of approximating the expected utility function. Current choices are `method = "quadrature"` for a deterministic quadrature approximation and `method = "MC"` for a stochastic Monte Carlo approximation. The first of these choices is only available when the argument `criterion = "A"`, `"D"` or `"E"`. The second choice is available for all possible values of the argument `criterion`. If left unspecified, the argument defaults to `"quadrature"` for `criterion = "A"`, `"D"` or `"E"` and to `"MC"` otherwise. See Details for more information. `nrq` For `method = "quadrature"`, a vector of length two specifying the number of radial abscissas (`nrq`) and quasi-random rotations (`nrq`) required for the implemented quadrature scheme; see Details for more information. If left unspecified, the default value is `c(2, 8)`.

## Details

Two utility functions are implemented.

1. Shannon information gain (SIG)

The utility function is

U^SIG(d) = f(θ|y,d) - f(θ),

where f(θ|y,d) and f(θ) denote the posterior and prior densities of the parameters θ, respectively.

2. Negative squared error loss (NSEL)

The utility function is

u^NSEL(d) = - (θ - E(θ |y,d))^T(θ - E(θ |y,d)),

where E(θ | y,d) denotes the posterior mean of θ.

In both cases the utility function is not available in closed form due to the analytical intractability of either the posterior distribution (for SIG) or the posterior mean (for NSEL). The `acebayes` package implements two approximations to both utility functions. If `criterion = "SIG"` or `criterion = "NSEL"` then sampling-based Monte Carlo or importance sampling approximations will be employed. This was the original approach used by Overstall & Woods (2017). If `criterion = "SIG-Norm"` or `criterion = "NSEL-Norm"` then approximations based on approximate normality of the posterior (Overstall et al., 2017) will be used.

The normal approximation to the posterior can be taken further leading to the approximation by some scalar function of the Fisher information matrix, I (θ;d), which only depends on θ (Chaloner & Verdinelli, 1995). In the case of SIG, the approximate utility is given by

u^D(d) = log | I(θ;d)|,

and the resulting design is typically called pseudo-Bayesian D-optimal. For NSEL, the approximate utility is given by

u^A(d) = - tr (I(θ;d)^(-1))

with the resulting design termed pseudo-Bayesian A-optimal. These designs are often used under the frequentist approach to optimal experimental design and so to complete the usual set, the following utility for finding a pseudo-Bayesian E-optimal design is also implemented:

U^E(d) = min(e(I(θ;d))),

where e() denotes the function that calculates the eigenvalues of its argument.

The expected utilities can be approximated using Monte Carlo methods (`method = "MC"` for all criteria) or using a deterministic quadrature method (`method = "quadrature"`, implemented for the D, A and E criteria). The former approach approximates the expected utility via sampling from the prior. The latter approach uses a radial-spherical integration rule (Monahan and Genz, 1997) and `B` specifies the number, n_r, of radial abscissas and `B` specifies the number, n_q, of random rotations. Larger values of n_r will produce more accurate, but also more computationally expensive, approximations. See Gotwalt et al. (2009) for further details.

For `utilityglm`, note that the utility functions for SIG and NSEL are currently only implemented for logistic regression, i.e. `family = binomial`, or Poisson regression, i.e. `family = poisson(link = "log")`, whereas the utility functions for pseudo-Bayesian designs are implemented for generic GLM families.

For more details on the ACE algorithm, see Overstall & Woods (2017).

## Value

The function will return a list with the following components:

 `utility` The utility function resulting from the choice of arguments.

## Author(s)

Antony M. Overstall A.M.Overstall@soton.ac.uk, David C. Woods, Maria Adamou & Damianos Michaelides

## References

Chaloner, K. & Verdinelli, I. (1995). Bayesian experimental design: a review. Statistical Science, 10, 273-304.

Gotwalt, C. M., Jones, B. A. & Steinberg, D. M. (2009). Fast computation of designs robust to parameter uncertainty for nonlinear settings. Technometrics, 51, 88-95.

Monahan, J. and Genz, A. (1997). Spherical-radial integration rules for Bayesian computation,” Journal of the American Statistical Association, 92, 664-674.

Overstall, A.M. & Woods, D.C. (2017). Bayesian design of experiments using approximate coordinate exchange. Technometrics, 59, 458-470.

`aceglm`, `acenlm`, `paceglm`, `pacenlm`.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52``` ```## 1. This example uses utilityglm to generate the pseudo-Bayesian D-optimality ## approximate expected utility function using a Monte Carlo approximation. low<-c(-3, 4, 5, -6, -2.5) upp<-c(3, 10, 11, 0, 3.5) ## Lower and upper limits of the uniform prior distributions. prior<-function(B){ t(t(6*matrix(runif(n=5*B),ncol=5))+low)} ## Create a function which specifies the prior. This function will return a ## B by 5 matrix where each row gives a value generated from the prior ## distribution for the model parameters. ex <- utilityglm(formula = ~x1+x2+x3+x4, family = binomial, prior = prior, method = "MC") set.seed(1) ## Set seed for reproducibility. n<-6 ## Specify the sample size (number of runs). start.d<-matrix( 2 * randomLHS(n = n,k = 4) - 1,nrow = n,ncol = 4, dimnames = list(as.character(1:n),c("x1", "x2", "x3", "x4"))) ## Generate an initial design of appropriate dimension. The initial design is a ## Latin hypercube sample. ex\$utility(d = start.d, B = 10) ## Evaluate resulting approximate utility. Should get: # -13.98143 -17.07772 -19.88988 -22.40720 -15.27411 -15.02717 -16.17253 -18.66600 -13.75118 # -21.83820 ## 2. This example uses utilitynlm to generate the psuedo-Bayesian A-optimality expected utility ## function using a quadrature approximation low<-c(0.01884, 0.298, 21.8) upp<-c(0.09884, 8.298, 21.8) ## Lower and upper limits of the uniform prior distributions. Note that the prior ## for the third element is a point mass. prior2 <- list(support = cbind(rbind(low, upp))) colnames(prior2\$support) <- c("a", "b", "c") ## Specify a uniform prior with ranges given by low and upp ex2 <- utilitynlm(formula = ~ c * (exp( - a * t) - exp( - b *t)), prior = prior2, desvars = "t") n <- 6 start.d <- matrix(24 * randomLHS(n = n, k = 1), nrow = n) colnames(start.d) <- "t" ex2\$utility(d = start.d) ## -13.17817 ```