Implementation of the ideas of Oakley and O'Hagan 2002

Description

Implementation of the ideas of Oakley and O'Hagan 2002: var.conditional() calculates the conditional variance-covariance matrix, and cond.sample() samples from the appropriate multivariate t distribution.

Usage

1
2
3
4
cond.sample(n = 1, x, xold, d, A, Ainv, scales = NULL, pos.def.matrix =
NULL, func = regressor.basis, ...)
var.conditional(x, xold, d, A, Ainv, scales = NULL, pos.def.matrix = NULL, 
    func = regressor.basis, distance.function = corr, ...)

Arguments

n

In function cond.sample(), the number of observations to take, defaulting to 1

x

Simulation design points

xold

Design points

d

Data vector

A

Correlation matrix

Ainv

Inverse of correlation matrix A

scales

Roughness lengths

pos.def.matrix

Positive definite matrix of correlations

func

Function to calculate H

distance.function

Distance function (defaulting to corr())

...

Further arguments passed to the distance function, usually corr()

Details

We wish to generate the distribution for the process at uncertain point x; uncertainty in x is captured by assuming it to be drawn from a pdf X.

The basic idea is to estimate m* at simulated design points using cond.sample(), which samples from the multivariate t distribution conditional on the data d at the design points. The random datavector of estimates m* is called ddash.

We repeat this process many times, each time estimating eta(.) using the augmented dataset c(d,ddash) as a training set.

For each estimated eta(.), we have a complete emulator that can be used to build up an ensemble of estimates.

Value

Function cond.sample() returns a n*p matrix whose rows are independent samples from the appropriate multivariate t distribution. Here, p is the number of rows of x (ie the number of simulated design points). Consider a case where there are just two simulated design points, close to each other but far from any point of the original design points. Then function cond.sample(n=4, ...) will give four numbers which are close to one another but have high (between-instantiation) variance.

Function var.conditional() calculates the denominator of equation 3 of Oakley and OHagan 2002. This function is intended to be called by cond.sample() but might be interesting per se when debugging or comparing different choices of simulated design points.

Note

Function cond.sample() and var.conditional() together are a superset of function interpolant() because it accounts for covariance between multiple observations. It is, however, much slower.

Also note that these functions are used to good effect in the examples section of oo2002.Rd.

Author(s)

Robin K. S. Hankin

References

  • J. Oakley 2002. Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika, 89(4):769–784

  • R. K. S. Hankin 2005. Introducing BACCO, an R bundle for Bayesian analysis of computer code output, Journal of Statistical Software, 14(16)

See Also

regressor.basis, for a more visually informative example of cond.sample() et seq; and interpolant for more examples

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
     # Now we use the functions.  First we set up some design points:
     # Suppose we are given the toy dataset and want to know the PDF of
     # fourth power of the response at point x, where uncertainty in x
     # may be represented as it being drawn from a normnl distribution
     # with mean c(0.5,0.5,...,0.5) and a variance of 0.001.

data(toy)
val <- toy
real.relation <- function(x){sum( (0:6)*x )}
H <- regressor.multi(val)
d <- apply(H,1,real.relation)

     # and some scales (which are assumed to be known):
fish <- rep(1,6)
fish[6] <- 4

     # And determine A and Ainv:
A <- corr.matrix(val,scales=fish)
Ainv <- solve(A)

     # and add some suitably correlated Gaussian noise:
d.noisy <-  as.vector(rmvnorm(n=1, mean=d, 0.1*A))
names(d.noisy) <- names(d)

     # Now some simulation design points.  Choose n'=6:
xdash <-  matrix(runif(36),ncol=6)

     # And just for fun, we insert a near-useless seventh simulation
     # design point (it is nearly useless because it is a near copy of
     # the sixth).  We do this in order to test the coding:
xdash <- rbind(xdash,xdash[6,] + 1e-4)
colnames(xdash) <- colnames(val)
rownames(xdash) <- c("alpha","beta","gamma","delta","epsilon","zeta","zeta.copy")

     # Print the variance matrix:
(vm <- var.conditional(x=xdash,xold=val,d=d.noisy,A=A,Ainv=Ainv,scales=fish))
     # Note that the sixth and seventh columns are almost identical
     # (and so, therefore, are the sixth and seventh rows) as
     # expected.

     # Also, the final eigenvalue of vm should be small:
eigen(vm)$values


     # Now sample from the conditional t-distribution.  Taking n=3 samples:
(cs <- cond.sample(n=3, x=xdash, xold=val, d=d.noisy, A=A, Ainv=Ainv,
                   scales = fish, func = regressor.basis))

     # Note the last two columns are nearly identical, as expected.

     # Just as a test, what is the variance matrix at the design points?
(vc <- var.conditional(x=val,xold=val,d=d.noisy,A=A,Ainv=Ainv,scales=fish))
     # (This should be exactly zero);
max(eigen(vc)$values)
     # should be small

     # Next, we apply the methods of OO2002 using Monte Carlo techniques.
     # We will generate  10 different versions of eta:
number.of.eta <- 10

     # And, for each eta,  we will sample from the posterior t distribution 11 times:
number.of.X <- 11


     # create an augmented design matrix, of the design points plus the
     # simulated design points:
design.augmented <- rbind(val,xdash)
A.augmented <- corr.matrix(design.augmented, scales=fish)
Ainv.augmented <- solve(A.augmented)

out <- NULL
for(i in seq_len(number.of.eta)){
        # Create random data by sampling from the conditional
        # multivariate t at the simulated design points xdash, from
        # the t-distribution given the data d:
   ddash <- cond.sample(n=1, x=xdash, xold=val, d=d.noisy, Ainv=Ainv, scales=fish)

        # Now use the emulator to calculate m^* at points chosen from
        # the PDF of X:
   jj <-
     interpolant.quick(x=rmvnorm(n=number.of.X,rep(0.5,6),diag(6)/1000),
                       d=c(d.noisy,ddash),
                       xold=design.augmented,
                       Ainv=Ainv.augmented,
                       scales=fish)
   out <- c(out,jj)
}

   # histogram of the fourth power:
hist(out^4, col="gray")
   # See oo2002 for another example of cond.sample() in use