# estimator: Estimates each known datapoint using the others as datapoints In emulator: Bayesian emulation of computer programs

## Description

Uses Bayesian techniques to estimate a model's prediction at each of `n` datapoints. To estimate the i-th point, conditioning variables of 1, ..., i-1 and i+1, ..., n inclusive are used (ie, all points except point i).

This routine is useful when finding optimal coefficients for the correlation using boot methods.

## Usage

 ```1 2``` ```estimator(val, A, d, scales=NULL, pos.def.matrix=NULL, func=regressor.basis) ```

## Arguments

 `val` Design matrix with rows corresponding to points at which the function is known `A` Correlation matrix (note that this is not the inverse of the correlation matrix) `d` Vector of observations `scales` Scales to be used to calculate `t(x)`. Note that `scales` has no default value because `estimator()` is most often used in the context of assessing the appropriateness of a given value of `scales`. If the desired distance matrix (called B in Oakley) is not diagonal, pass this matrix to `estimator()` via the `pos.def.matrix` argument. `pos.def.matrix` Positive definite matrix B

.

 `func` Function used to determine basis vectors, defaulting to `regressor.basis` if not given.

## Details

Given a matrix of observation points and a vector of observations, `estimator()` returns a vector of predictions. Each prediction is made in a three step process. For each index i:

• Observation `d[i]` is discarded, and row `i` and column `i` deleted from `A` (giving `A[-i,-i]`). Thus `d` and `A` are the observation vector and correlation matrix that would have been obtained had observation `i` not been available.

• The value of `d[i]` is estimated on the basis of the shortened observation vector and the comatrix of `A`.

It is then possible to make a scatterplot of `d` vs `dhat` where `dhat=estimator(val,A,d)`. If the scales used are “good”, then the points of this scatterplot will be close to `abline(0,1)`. The third step is to optimize the goodness of fit of this scatterplot.

## Value

A vector of observations of the same length as `d`.

## Author(s)

Robin K. S. Hankin

## References

• J. Oakley and A. O'Hagan, 2002. Bayesian Inference for the Uncertainty Distribution of Computer Model Outputs, Biometrika 89(4), pp769-784

• R. K. S. Hankin 2005. Introducing BACCO, an R bundle for Bayesian analysis of computer code output, Journal of Statistical Software, 14(16)

`optimal.scales`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23``` ```# example has 40 observations on 6 dimensions. # function is just sum( (1:6)*x) where x=c(x_1, ... , x_2) val <- latin.hypercube(40,6) colnames(val) <- letters[1:6] d <- apply(val,1,function(x){sum((1:6)*x)}) #pick some scales: fish <- rep(1,ncol(val)) A <- corr.matrix(val,scales=fish) #add some suitably correlated noise: d <- as.vector(rmvnorm(n=1, mean=d, 0.1*A)) # estimate d using the leave-out-one technique in estimator(): d.est <- estimator(val, A, d, scales=fish) #and plot the result: lims <- range(c(d,d.est)) par(pty="s") plot(d, d.est, xaxs="r", yaxs="r", xlim=lims, ylim=lims) abline(0,1) ```