# exp2d.rand: Random 2-d Exponential Data In tgp: Bayesian Treed Gaussian Process Models

## Description

A Random subsample of `data(exp2d)`, or Latin Hypercube sampled data evaluated with `exp2d.Z`

## Usage

 `1` ```exp2d.rand(n1 = 50, n2 = 30, lh = NULL, dopt = 1) ```

## Arguments

 `n1` Number of samples from the first, interesting, quadrant `n2` Number of samples from the other three, uninteresting, quadrants `lh` If `!is.null(lh)` then Latin Hypercube (LH) sampling (`lhs`) is used instead of subsampling from `data(exp2d)`; `lh` should be a single nonnegative integer specifying the desired number of predictive locations, `XX`; or, it should be a vector of length 4, specifying the number of predictive locations desired from each of the four quadrants (interesting quadrant first, then counter-clockwise) `dopt` If `dopt >= 2` then d-optimal subsampling from LH candidates of the multiple indicated by the value of `dopt` will be used. This argument only makes sense when `!is.null(lh)`

## Details

When `is.null(lh)`, data is subsampled without replacement from `data(exp2d)`. Of the `n1 + n2 <= 441` input/response pairs `X,Z`, there are `n1` are taken from the first quadrant, i.e., where the response is interesting, and the remaining `n2` are taken from the other three quadrants. The remaining `441 - (n1 + n2)` are treated as predictive locations

Otherwise, when `!is.null(lh)`, Latin Hypercube Sampling (`lhs`) is used

If `dopt >= 2` then `n1*dopt` LH candidates are used for to get a D-optimal subsample of size `n1` from the first (interesting) quadrant. Similarly `n2*dopt` in the rest of the un-interesting region. A total of `lh*dopt` candidates will be used for sequential D-optimal subsampling for predictive locations `XX` in all four quadrants assuming the already-sampled `X` locations will be in the design.

In all three cases, the response is evaluated as

Z(X) = X1 * exp(-X1^2-X2^2),

thus creating the outputs `Ztrue` and `ZZtrue`. Zero-mean normal noise with `sd=0.001` is added to the responses `Z` and `ZZ`

## Value

Output is a `list` with entries:

 `X` 2-d `data.frame` with `n1 + n2` input locations `Z` Numeric vector describing the responses (with noise) at the `X` input locations `Ztrue` Numeric vector describing the true responses (without noise) at the `X` input locations `XX` 2-d `data.frame` containing the remaining `441 - (n1 + n2)` input locations `ZZ` Numeric vector describing the responses (with noise) at the `XX` predictive locations `ZZtrue` Numeric vector describing the responses (without noise) at the `XX` predictive locations

## Author(s)

Robert B. Gramacy, [email protected], and Matt Taddy, [email protected]

## References

Gramacy, R. B. (2007). tgp: An R Package for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models. Journal of Statistical Software, 19(9). http://www.jstatsoft.org/v19/i09

Gramacy, R. B., Lee, H. K. H. (2007). Bayesian treed Gaussian process models with an application to computer modeling Journal of the American Statistical Association, to appear. Also available as ArXiv article 0710.4536 http://arxiv.org/abs/0710.4536

`lhs`, `exp2d`, `exp2d.Z`, `btgp`, and other `b*` functions
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59``` ```## randomly subsampled data ## ------------------------ eds <- exp2d.rand() # higher span = 0.5 required because the data is sparse # and was generated randomly eds.g <- interp.loess(eds\$X[,1], eds\$X[,2], eds\$Z, span=0.5) # perspective plot, and plot of the input (X & XX) locations par(mfrow=c(1,2), bty="n") persp(eds.g, main="loess surface", theta=-30, phi=20, xlab="X[,1]", ylab="X[,2]", zlab="Z") plot(eds\$X, main="Randomly Subsampled Inputs") points(eds\$XX, pch=19, cex=0.5) ## Latin Hypercube sampled data ## ---------------------------- edlh <- exp2d.rand(lh=c(20, 15, 10, 5)) # higher span = 0.5 required because the data is sparse # and was generated randomly edlh.g <- interp.loess(edlh\$X[,1], edlh\$X[,2], edlh\$Z, span=0.5) # perspective plot, and plot of the input (X & XX) locations par(mfrow=c(1,2), bty="n") persp(edlh.g, main="loess surface", theta=-30, phi=20, xlab="X[,1]", ylab="X[,2]", zlab="Z") plot(edlh\$X, main="Latin Hypercube Sampled Inputs") points(edlh\$XX, pch=19, cex=0.5) # show the quadrants abline(h=2, col=2, lty=2, lwd=2) abline(v=2, col=2, lty=2, lwd=2) ## Not run: ## D-optimal subsample with a factor of 10 (more) candidates ## --------------------------------------------------------- edlhd <- exp2d.rand(lh=c(20, 15, 10, 5), dopt=10) # higher span = 0.5 required because the data is sparse # and was generated randomly edlhd.g <- interp.loess(edlhd\$X[,1], edlhd\$X[,2], edlhd\$Z, span=0.5) # perspective plot, and plot of the input (X & XX) locations par(mfrow=c(1,2), bty="n") persp(edlhd.g, main="loess surface", theta=-30, phi=20, xlab="X[,1]", ylab="X[,2]", zlab="Z") plot(edlhd\$X, main="D-optimally Sampled Inputs") points(edlhd\$XX, pch=19, cex=0.5) # show the quadrants abline(h=2, col=2, lty=2, lwd=2) abline(v=2, col=2, lty=2, lwd=2) ## End(Not run) ```