Description Usage Arguments Details Value Note Author(s) References Examples
Provides bootstrapped block Latin hypercube subsampling under a given data set to aid in consistent estimation of a global separable lengthscale parameter
1 2 
y 
a vector of responses/dependent values with 
X 
a 
m 
a positive scalar integer giving the number of divisions on each coordinate of input space defining the block structure 
K 
a positive scalar integer specifying the number of Bootstrap replicates desired 
da 
a lengthscale prior, say as generated by 
g 
a positive scalar giving the fixed nugget value of the nugget parameter; by default 
maxit 
a positive scalar integer giving the maximum number of iterations for MLE calculations via 
verb 
a nonnegative integer specifying the verbosity level; 
plot.it 

Bootstrapped block Latin hypercube subsampling (BLHS) yields a global lengthscale estimator
which is asymptotically consistent with the MLE calculated on the full data set. However, since it
works on data subsets, it comes at a much reduced computational cost. Intuitively, the BLHS
guarantees a good mix of short and long pairwise distances. A single bootstrap LH subsample
may be obtained by dividing each dimension of the input space equally into m
intervals, yielding m^d mutually exclusive hypercubes. It is easy to show
that the average number of observations in each hypercube is N*m^(d)
if there are N samples in the original design. From each of these hypercubes,
m
blocks
are randomly selected following the LH paradigm, i.e., so that
only one interval is chosen from each of the m
segments. The average number of
observations in the subsample, combining the m
randomly selected blocks,
is N*m^(d+1).
Ensuring a subsample size of at least one
requires having m <= N^[1/(d1)],
thereby linking the parameter m
to computational effort. Smaller m
is preferred so long
as GP inference on data of that size remains tractable. Since the blocks follow
an LH structure, the resulting subdesign inherits the usual LHS properties,
e.g., retaining marginal properties like univariate stratification modulo features present in the original,
large N
, design.
For more details, see Liu (2014), Zhao, et al. (2017) and Sun, et al. (2017).
blhs
returns the subsampled input space and the corresponding responses.
blhs.loop
returns the median of the K
lengthscale maximum likelihood estimates, the subsampled data size to which
that corresponds, and the subsampled data, including the input space and the responses, from the bootstrap iterations
blhs
returns

the subsampled input space 

the subsampled responses, 
blhs.loop
returns

the lengthscale estimate (median), 

the subsampled data size (median) 

the subsampled input space (median) 

the subsampled responses (median) 
The global input space should be relatively homogeneous. A spacefilling design (input) X
is ideal, but not required
Robert B. Gramacy [email protected] and Furong Sun [email protected]
F. Sun, R.B. Gramacy, B. Haaland, E. Lawrence, and A. Walker (2017) Emulating satellite drag from large simulation experiments; preprint on arXiv:1712.00182. http://arxiv.org/abs/1712.00182
Y. Zhao, Y. Hung, and Y. Amemiya (2017). Efficient Gaussian Process Modeling using Experimental DesignBased Subagging. To appear in Statistica Sinica;
Yufan Liu (2014) Recent Advances in Computer Experiment Modeling. Ph.D. Thesis at Rutgers, The State University of New Jersey. http://dx.doi.org/doi:10.7282/T38G8J1H
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71  # input space based on latinhypercube sampling (not required)
library(lhs)
## two dimensional example with N=216 sized sample
X < randomLHS(216, 2)
# pseudo responses, not important for visualizing design
Y < runif(216)
## BLHS sample with m=6 divisions in each coordinate
sub < blhs(y=Y, X=X, m=6)
Xsub < sub$xs # the bootstrapped subsample
# visualization
plot(X, xaxt="n", yaxt="n", xlim=c(0,1), ylim=c(0,1), xlab="factor 1",
ylab="factor 2", col="cyan", main="BLHS")
b < seq(0, 1, by=1/6)
abline(h=b, v=b, col="black", lty=2)
axis(1, at=seq (0, 1, by=1/6), cex.axis=0.8,
labels=expression(0, 1/6, 2/6, 3/6, 4/6, 5/6, 1))
axis(2, at=seq (0, 1, by=1/6), cex.axis=0.8,
labels=expression(0, 1/6, 2/6, 3/6, 4/6, 5/6, 1), las=1)
points(Xsub, col="red", pch=19, cex=1.25)
## Comparing global lengthscale MLE based on BLHS and random subsampling
## Not run:
# famous borehole function
borehole < function(x){
rw < x[1] * (0.15  0.05) + 0.05
r < x[2] * (50000  100) + 100
Tu < x[3] * (115600  63070) + 63070
Tl < x[5] * (116  63.1) + 63.1
Hu < x[4] * (1110  990) + 990
Hl < x[6] * (820  700) + 700
L < x[7] * (1680  1120) + 1120
Kw < x[8] * (12045  9855) + 9855
m1 < 2 * pi * Tu * (Hu  Hl)
m2 < log(r / rw)
m3 < 1 + 2*L*Tu/(m2*rw^2*Kw) + Tu/Tl
return(m1/m2/m3)
}
N < 100000 # number of observations
xt < randomLHS(N, 8) # input space
yt < apply(xt, 1, borehole) # response
colnames(xt) < c("rw", "r", "Tu", "Tl", "Hu", "Hl", "L", "Kw")
## prior on the GP lengthscale parameter
da < darg(list(mle=TRUE, max=100), xt)
## make space for two sets of boxplots
par(mfrow=c(1,2))
# BLHS calculating with visualization of the K MLE lengthscale estimates
K < 10 # number of Bootstrap samples; Sun, et al (2017) uses K < 31
sub_blhs < blhs.loop(y=yt, X=xt, K=K, m=2, da=da, maxit=200, plot.it=TRUE)
# a random subsampling analog for comparison
sn < sub_blhs$ly # extract a size that is consistent with the BLHS
that.rand < matrix(NA, ncol=8, nrow=K)
for(i in 1:K){
sub < sample(1:nrow(xt), sn)
gpsepi < newGPsep(xt[sub,], yt[sub], d=da$start, g=1e3, dK=TRUE)
mle < mleGPsep(gpsepi, tmin=da$min, tmax=10*da$max, ab=da$ab, maxit=200)
deleteGPsep(gpsepi)
that.rand[i,] < mle$d
}
## put random boxplots next to BLHS ones
boxplot(that.rand, xlab="input", ylab="thetahat", col=2,
main="random subsampling")
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.