Predict method for Treed Gaussian process models

Description

This generic prediction method was designed to obtain samples from the posterior predictive distribution after the b* functions have finished. Samples, or kriging mean and variance estimates, can be obtained from the MAP model encoded in the "tgp"-class object, or this parameterization can be used as a jumping-off point in obtaining further samples from the joint posterior and posterior predictive distributions

Usage

1
2
3
4
5
## S3 method for class 'tgp'
predict(object, XX = NULL, BTE = c(0, 1, 1), R = 1,
            MAP = TRUE, pred.n = TRUE, krige = TRUE, zcov = FALSE,
            Ds2x = FALSE, improv = FALSE, sens.p = NULL, trace = FALSE,
            verb = 0, ...)

Arguments

object

"tgp"-class object that is the output of one of the b* functions: blm, btlm bgp, bgpllm, btgp, or btgpllm

XX

Optional data.frame, matrix, or vector of predictive input locations with ncol(XX) == ncol(object$X)

BTE

3-vector of Monte-carlo parameters (B)urn in, (T)otal, and (E)very. Predictive samples are saved every E MCMC rounds starting at round B, stopping at T. The default BTE=c(0,1,1) is specified to give the kriging means and variances as outputs, plus one sample from the posterior predictive distribution

R

Number of repeats or restarts of BTE MCMC rounds, default R=1 is no restarts

MAP

When TRUE (default) predictive data (i.e., kriging mean and variance estimates, and samples from the posterior predictive distribution) are obtained for the fixed MAP model encoded in object. Otherwise, when MAP=FALSE sampling from the joint posterior of the model parameters (i.e., tree and GPs) and the posterior predictive distribution are obtained starting from the MAP model and proceeding just as the b* functions

pred.n

TRUE (default) value results in prediction at the inputs X; FALSE skips prediction at X resulting in a faster implementation

krige

TRUE (default) value results in collection of kriging means and variances at predictive (and/or data) locations; FALSE skips the gathering of kriging statistics giving a savings in storage

zcov

If TRUE then the predictive covariance matrix is calculated– can be computationally (and memory) intensive if X or XX is large. Otherwise only the variances (diagonal of covariance matrices) are calculated (default). See outputs Zp.s2, ZZ.s2, etc., below

Ds2x

TRUE results in ALC (Active Learning–Cohn) computation of expected reduction in uncertainty calculations at the X locations, which can be used for adaptive sampling; FALSE (default) skips this computation, resulting in a faster implementation

improv

TRUE results in samples from the improvement at locations XX with respect to the observed data minimum. These samples are used to calculate the expected improvement over XX, as well as to rank all of the points in XX in the order that they should be sampled to minimize the expected multivariate improvement (refer to Schonlau et al, 1998). Alternatively, improv can be set to any positive integer 'g', in which case the ranking is performed with respect to the expectation for improvement raised to the power 'g'. Increasing 'g' leads to rankings that are more oriented towards a global optimization. The option FALSE (default) skips these computations, resulting in a faster implementation. Optionally, a two-vector can be supplied where improv[2] is interpreted as the (maximum) number of points to rank by improvement. See the note in btgp documentation. If not specified, then the larger of 10% of nn = nrow(XX) and min(10, nn) is taken by default

sens.p

Either NULL or a vector of parameters for sensitivity analysis, built by the function sens. Refer there for details

trace

TRUE results in a saving of samples from the posterior distribution for most of the parameters in the model. The default is FALSE for speed/storage reasons. See note below

verb

Level of verbosity of R-console print statements: from 0 (default: none); 1 which shows the “progress meter”; 2 includes an echo of initialization parameters; up to 3 and 4 (max) with more info about successful tree operations

...

Ellipses are not used in the current version of predict.tgp. They are are only included in order to maintain S3 generic/method consistency

Details

While this function was designed with prediction in mind, it is actually far more general. It allows a continuation of MCMC sampling where the b* function left off (when MAP=FALSE) with a possibly new set of predictive locations XX. The intended use of this function is to obtain quick kriging-style predictions for a previously-fit MAP estimate (contained in a "tgp"-class object) on a new set of predictive locations XX. However, it can also be used simply to extend the search for an MAP model when MAP=FALSE, pred.n=FALSE, and XX=NULL

Value

The output is the same, or a subset of, the output produced by the b* functions, for example see btgp

Note

It is important to note that this function is not a replacement for supplying XX to the b* functions, which is the only way to get fully Bayesian samples from the posterior prediction at new inputs. It is only intended as a post-analysis (diagnostic) tool.

Inputs XX containing NaN, NA, or Inf are discarded with non-fatal warnings. Upon execution, MCMC reports are made every 1,000 rounds to indicate progress.

If XXs are provided which fall outside the range of X inputs provided to the original b* function, then those will not be extrapolated properly, due to the way that bounding rectangles are defined in the original run. For a workaround, supply out$Xsplit <- rbind(X, XX) before running predict on out.

See note for btgp or another b* function regarding the handling and appropriate specification of traces.

The "tgp" class output produced by predict.tgp can also be used as input to predict.tgp, as well as others (e.g., plot.tgp.

Author(s)

Robert B. Gramacy, rbgramacy@chicagobooth.edu, and Matt Taddy, taddy@chicagobooth.edu

References

http://bobby.gramacy.com/r_packages/tgp

See Also

predict, blm, btlm, bgp, btgp, bgpllm, btgpllm, plot.tgp

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
## revisit the Motorcycle data
require(MASS)

## fit a btgpllm without predictive sampling (for speed)
out <- btgpllm(X=mcycle[,1], Z=mcycle[,2], bprior="b0", 
	       pred.n=FALSE)
## nothing to plot here because there is no predictive data

## save the "tgp" class output object for use later and
save(out, file="out.Rsave")

## then remove it (for illustrative purposes)
out <- NULL

## (now imagine emailing the out.Rsave file to a friend who
## then performs the following in order to use your fitted
## tgp model on his/her own predictive locations)

## load in the "tgp" class object we just saved
load("out.Rsave")

## new predictive locations
XX <- seq(2.4, 56.7, length=200)

## now obtain kriging estimates from the MAP model
out.kp <- predict(out, XX=XX, pred.n=FALSE)
plot(out.kp, center="km", as="ks2")

## actually obtain predictive samples from the MAP
out.p <- predict(out, XX=XX, pred.n=FALSE, BTE=c(0,1000,1))
plot(out.p)

## use the MAP as a jumping-off point for more sampling
out2 <- predict(out, XX, pred.n=FALSE, BTE=c(0,2000,2),
                MAP=FALSE, verb=1)
plot(out2)

## (generally you would not want to remove the file)
unlink("out.Rsave")