# oo2002: Implementation of the ideas of Oakley and O'Hagan 2002 In emulator: Bayesian Emulation of Computer Programs

## Description

Implementation of the ideas of Oakley and O'Hagan 2002: `var.conditional()` calculates the conditional variance-covariance matrix, and `cond.sample()` samples from the appropriate multivariate t distribution.

## Usage

 ```1 2 3 4``` ```cond.sample(n = 1, x, xold, d, A, Ainv, scales = NULL, pos.def.matrix = NULL, func = regressor.basis, ...) var.conditional(x, xold, d, A, Ainv, scales = NULL, pos.def.matrix = NULL, func = regressor.basis, distance.function = corr, ...) ```

## Arguments

 `n` In function `cond.sample()`, the number of observations to take, defaulting to 1 `x` Simulation design points `xold` Design points `d` Data vector `A` Correlation matrix `Ainv` Inverse of correlation matrix A `scales` Roughness lengths `pos.def.matrix` Positive definite matrix of correlations `func` Function to calculate H `distance.function` Distance function (defaulting to `corr()`) `...` Further arguments passed to the distance function, usually `corr()`

## Details

We wish to generate the distribution for the process at uncertain point `x`; uncertainty in `x` is captured by assuming it to be drawn from a pdf `X`.

The basic idea is to estimate m* at simulated design points using `cond.sample()`, which samples from the multivariate t distribution conditional on the data `d` at the design points. The random datavector of estimates m* is called `ddash`.

We repeat this process many times, each time estimating eta(.) using the augmented dataset `c(d,ddash)` as a training set.

For each estimated eta(.), we have a complete emulator that can be used to build up an ensemble of estimates.

## Value

Function `cond.sample()` returns a n*p matrix whose rows are independent samples from the appropriate multivariate t distribution. Here, p is the number of rows of `x` (ie the number of simulated design points). Consider a case where there are just two simulated design points, close to each other but far from any point of the original design points. Then function `cond.sample(n=4, ...)` will give four numbers which are close to one another but have high (between-instantiation) variance.

Function `var.conditional()` calculates the denominator of equation 3 of Oakley and OHagan 2002. This function is intended to be called by `cond.sample()` but might be interesting per se when debugging or comparing different choices of simulated design points.

## Note

Function `cond.sample()` and `var.conditional()` together are a superset of function `interpolant()` because it accounts for covariance between multiple observations. It is, however, much slower.

Also note that these functions are used to good effect in the examples section of `oo2002.Rd`.

## Author(s)

Robin K. S. Hankin

## References

• J. Oakley 2002. Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika, 89(4):769–784

• R. K. S. Hankin 2005. Introducing BACCO, an R bundle for Bayesian analysis of computer code output, Journal of Statistical Software, 14(16)

## See Also

`regressor.basis`, for a more visually informative example of `cond.sample()` et seq; and `interpolant` for more examples

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91``` ``` # Now we use the functions. First we set up some design points: # Suppose we are given the toy dataset and want to know the PDF of # fourth power of the response at point x, where uncertainty in x # may be represented as it being drawn from a normnl distribution # with mean c(0.5,0.5,...,0.5) and a variance of 0.001. data(toy) val <- toy real.relation <- function(x){sum( (0:6)*x )} H <- regressor.multi(val) d <- apply(H,1,real.relation) # and some scales (which are assumed to be known): fish <- rep(1,6) fish[6] <- 4 # And determine A and Ainv: A <- corr.matrix(val,scales=fish) Ainv <- solve(A) # and add some suitably correlated Gaussian noise: d.noisy <- as.vector(rmvnorm(n=1, mean=d, 0.1*A)) names(d.noisy) <- names(d) # Now some simulation design points. Choose n'=6: xdash <- matrix(runif(36),ncol=6) # And just for fun, we insert a near-useless seventh simulation # design point (it is nearly useless because it is a near copy of # the sixth). We do this in order to test the coding: xdash <- rbind(xdash,xdash[6,] + 1e-4) colnames(xdash) <- colnames(val) rownames(xdash) <- c("alpha","beta","gamma","delta","epsilon","zeta","zeta.copy") # Print the variance matrix: (vm <- var.conditional(x=xdash,xold=val,d=d.noisy,A=A,Ainv=Ainv,scales=fish)) # Note that the sixth and seventh columns are almost identical # (and so, therefore, are the sixth and seventh rows) as # expected. # Also, the final eigenvalue of vm should be small: eigen(vm)\$values # Now sample from the conditional t-distribution. Taking n=3 samples: (cs <- cond.sample(n=3, x=xdash, xold=val, d=d.noisy, A=A, Ainv=Ainv, scales = fish, func = regressor.basis)) # Note the last two columns are nearly identical, as expected. # Just as a test, what is the variance matrix at the design points? (vc <- var.conditional(x=val,xold=val,d=d.noisy,A=A,Ainv=Ainv,scales=fish)) # (This should be exactly zero); max(eigen(vc)\$values) # should be small # Next, we apply the methods of OO2002 using Monte Carlo techniques. # We will generate 10 different versions of eta: number.of.eta <- 10 # And, for each eta, we will sample from the posterior t distribution 11 times: number.of.X <- 11 # create an augmented design matrix, of the design points plus the # simulated design points: design.augmented <- rbind(val,xdash) A.augmented <- corr.matrix(design.augmented, scales=fish) Ainv.augmented <- solve(A.augmented) out <- NULL for(i in seq_len(number.of.eta)){ # Create random data by sampling from the conditional # multivariate t at the simulated design points xdash, from # the t-distribution given the data d: ddash <- cond.sample(n=1, x=xdash, xold=val, d=d.noisy, Ainv=Ainv, scales=fish) # Now use the emulator to calculate m^* at points chosen from # the PDF of X: jj <- interpolant.quick(x=rmvnorm(n=number.of.X,rep(0.5,6),diag(6)/1000), d=c(d.noisy,ddash), xold=design.augmented, Ainv=Ainv.augmented, scales=fish) out <- c(out,jj) } # histogram of the fourth power: hist(out^4, col="gray") # See oo2002 for another example of cond.sample() in use ```

emulator documentation built on May 17, 2018, 9:03 a.m.