estimator: Estimates each known datapoint using the others as datapoints
In emulator: Bayesian Emulation of Computer Programs

estimator

R Documentation

Estimates each known datapoint using the others as datapoints

Description

Uses Bayesian techniques to estimate a model's prediction at each of n datapoints. To estimate the i^{\rm th} point, conditioning variables of 1,\ldots, i-1 and i+1,\ldots, n inclusive are used (ie, all points except point i).

This routine is useful when finding optimal coefficients for the correlation using boot methods.

Usage

estimator(val, A, d, scales=NULL, pos.def.matrix=NULL,
func=regressor.basis)

Arguments

`val`	Design matrix with rows corresponding to points at which the function is known
`A`	Correlation matrix (note that this is not the inverse of the correlation matrix)
`d`	Vector of observations
`scales`	Scales to be used to calculate `t(x)`. Note that `scales` has no default value because `estimator()` is most often used in the context of assessing the appropriateness of a given value of `scales`. If the desired distance matrix (called `B` in Oakley) is not diagonal, pass this matrix to `estimator()` via the `pos.def.matrix` argument.
`pos.def.matrix`	Positive definite matrix `B`
`func`	Function used to determine basis vectors, defaulting to `regressor.basis` if not given.

Details

Given a matrix of observation points and a vector of observations, estimator() returns a vector of predictions. Each prediction is made in a three step process. For each index i:

Observation d[i] is discarded, and row i and column i deleted from A (giving A[-i,-i]). Thus d and A are the observation vector and correlation matrix that would have been obtained had observation i not been available.
The value of d[i] is estimated on the basis of the shortened observation vector and the comatrix of A.

It is then possible to make a scatterplot of d vs dhat where dhat=estimator(val,A,d). If the scales used are “good”, then the points of this scatterplot will be close to abline(0,1). The third step is to optimize the goodness of fit of this scatterplot.

Value

A vector of observations of the same length as d.

Author(s)

Robin K. S. Hankin

References

J. Oakley and A. O'Hagan, 2002. Bayesian Inference for the Uncertainty Distribution of Computer Model Outputs, Biometrika 89(4), pp769-784
R. K. S. Hankin 2005. Introducing BACCO, an R bundle for Bayesian analysis of computer code output, Journal of Statistical Software, 14(16)

Examples

# example has 11 observations on 6 dimensions.
# function is just sum( (1:6)*x) where x=c(x_1, ... , x_2)

val <- latin.hypercube(11,6)
colnames(val) <- letters[1:6]
d <- apply(val,1,function(x){sum((1:6)*x)})

#pick some scales:
fish <- rep(1,ncol(val))
A <- corr.matrix(val,scales=fish)

#add some suitably correlated noise:
d <- as.vector(rmvnorm(n=1, mean=d, 0.1*A))

# estimate d using the leave-out-one technique in estimator():
d.est <- estimator(val, A, d, scales=fish)

#and plot the result:
lims <- range(c(d,d.est))
par(pty="s")
plot(d, d.est, xaxs="r", yaxs="r", xlim=lims, ylim=lims)
abline(0,1)

emulator documentation built on May 29, 2024, 6:18 a.m.