Description Usage Arguments Value Warning Author(s) References See Also Examples
Multiple fold cross validation for a km
object without noisy observations.
1 |
model |
an object of class "km" without noisy observations. |
folds |
a list of index subsets without index redundancy within each fold. |
type |
a character string corresponding to the kriging family, to be chosen between simple kriging ("SK"), or universal kriging ("UK"). |
trend.reestim |
should the trend be reestimated when removing an observation? Default to FALSE. |
fast |
binary option to use analytical multiple fold cross validation formulae when applicable. |
light |
binary option to force not calculating cross validation residual covariances between different folds (relevant, e.g., when performing speed comparisons across baseline versus fast settings). |
A list composed of
mean |
a list of cross validation mean predictions with same number of elements and respective dimensions than in folds. The ith element is equal to the kriging mean (including the trend) at the ith fold number when it is left out of the design, |
y |
a vector of actual responses, |
cvcov.list |
a list of cross validation conditional covariance matrices with same number of elements than in folds and dimensions set accordingly. The ith element is equal to the kriging covariance matrix corresponding to the ith fold number when it is left out of the design, |
cvcov.mat |
a ntot*ntot matrix containing all covariances between cross-validation errors (stacked with respect to orders between and within folds), |
where ntot is the total number of points in the folds list (with possible point redundancies as some points may belong to several folds).
Kriging parameters are not re-estimated when removing observations. With few points in the learning set, the re-estimated values can be far from those obtained with the entire learning set. One option is to reestimate the trend coefficients, by setting trend.reestim=TRUE
.
D. Ginsbourger, University of Bern.
F. Bachoc (2013), Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification. Computational Statistics and Data Analysis, 66, 55-69.
N.A.C. Cressie (1993), Statistics for spatial data, Wiley series in probability and mathematical statistics.
O. Dubrule (1983), Cross validation of Kriging in a unique neighborhood. Mathematical Geology, 15, 687-699.
J. Gallier. The schur complement and symmetric positive semidefinite (and definite) matrices. Retrieved at https://www.cis.upenn.edu/~jean/schur-comp.pdf.
D. Ginsbourger and C. Schaerer (2021). Fast calculation of Gaussian Process multiple-fold cross-validation residuals and their covariances. arXiv:2101.03108 [stat.ME].
J.D. Martin and T.W. Simpson (2005), Use of kriging models to approximate deterministic computer models, AIAA Journal, 43 no. 4, 853-863.
M. Schonlau (1997), Computer experiments and global optimization, Ph.D. thesis, University of Waterloo.
predict,km-method
, leaveOneOut.km
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | # -------------------------------------------------
# A 1D example illustrating leave-one-out residuals
# and their correlation
# -------------------------------------------------
# Test function (From Xiong et al. 2007; See scalingFun's doc)
myfun <- function(x){
sin(30 * (x - 0.9)^4) * cos(2 * (x - 0.9)) + (x - 0.9) / 2
}
t <- seq(from = 0, to = 1, by = 0.005)
allresp <- myfun(t)
par(mfrow = c(1, 1), mar = c(4, 4, 2, 2))
plot(t, allresp, type = "l")
# Design points and associated responses
nn <- 10
design <- seq(0, 1, length.out = nn)
y <- myfun(design)
points(design, y, pch = 19)
# Model definition and GP prediction (Kriging)
set.seed(1)
model1 <- km(design = data.frame(design = design),
response = data.frame(y = y), nugget = 1e-5,
multistart = 10, control = list(trace = FALSE))
pred1 <- predict(model1, newdata = data.frame(design = t), type = "UK")
lines(t, pred1$mean, type = "l", col = "blue", lty = 2, lwd = 2)
# Plotting the prediction error versus the GP standard deviation
par(mfrow = c(2,1))
pred_abserrors <- abs(allresp - pred1$mean)
plot(t, pred_abserrors, type = "l", ylab = "abs pred error")
plot(t, pred1$sd, type = "l", ylab = "GP prediction sd")
# Leave-one-out cross-validation with the cv function
loofolds <- as.list(seq(1, length(design)))
loo1 <- cv(model = model1, folds = loofolds, type = "UK",
trend.reestim = TRUE, fast = TRUE, light = FALSE)
# y axis limits need to be taken care of
plotCVmean <- function(cvObj){
cvObjMean <- unlist(cvObj$mean)
plot(t, allresp, type = "l", ylim = range(cvObjMean, allresp))
points(design, y, pch = 19)
lines(t, pred1$mean, type = "l", col = "blue", lty = 2, lwd = 2)
points(design, cvObjMean, col = "red", pch = 22, lwd = 2)
}
plotCVsd <- function(cvObj, ylim){
cv_abserrors <- abs(y - unlist(cvObj$mean))
plot(t, pred_abserrors, type = "l", ylab = "abs pred error",
ylim = ylim)
points(design, cv_abserrors, col = "red", pch = 22, lwd = 2)
lines(t, pred1$sd, ylab = "GP prediction sd", col = "blue",
lty = 2, lwd = 2)
}
loo1Mean <- unlist(loo1$mean)
loo_abserrors <- abs(y - loo1Mean)
ylim <- c(0, max(loo_abserrors, pred_abserrors))
plotCVmean(loo1)
plotCVsd(loo1, ylim = ylim)
# Calculation of uncorrelated CV residuals and corresponding qqplot
T <- model1@T
B <- diag(as.numeric(diag(loo1$cvcov.mat))^(-1))
res <- y - loo1Mean
stand <- T %*% B %*% res
opar <- par(mfrow = c(1, 2))
qqnorm(stand,
main = "Normal Q-Q Plot of uncorrelated LOO Residuals")
abline(a = 0, b = 1)
# Comparison to "usual" standardized LOO residuals
usual_stand <- diag(as.numeric(diag(loo1$cvcov.mat))^(-1/2)) %*% res
qqnorm(usual_stand,
main = "Normal Q-Q Plot of Standardized LOO Residuals")
abline(a = 0, b = 1)
par(opar)
# Calculation and plot of correlations between most left
# and other cross-validation residuals
cvcov.mat <- loo1$cvcov.mat
coco <- cov2cor(cvcov.mat)
par(mfrow = c(1, 1))
plot(coco[1, ], type = "h", ylim = c(-1, 1), lwd = 2,
main = "Correlation between first and other LOO residuals",
ylab = "Correlation")
points(coco[1, ])
abline(h = 0, lty = "dotted")
par(mfrow = c(1, 1), mar = c(5.1, 4.1, 4.1, 2.1))
# ------------------------------------------------
# Same example with multiple-fold cross validation
# under various settings
# ------------------------------------------------
# First with successive two-element folds
myfolds <- list(c(1, 2), c(3, 4), c(5, 6), c(7, 8), c(9, 10))
cv_2fold <- cv(model = model1, folds = myfolds, type = "SK",
trend.reestim = FALSE, fast = TRUE, light = FALSE)
cv_2fold
opar <- par(mfrow = c(2,1))
plotCVmean(cv_2fold)
plotCVsd(cv_2fold, ylim = ylim)
# With overlapping two-element folds
myfolds <- list(c(1, 3), c(2, 4), c(3, 5), c(4, 6),
c(5, 7), c(6, 8), c(7, 9), c(8, 10))
cv_2fold_overlap <- cv(model = model1, folds = myfolds, type = "UK",
trend.reestim = TRUE, fast = TRUE, light = FALSE)
cv_2fold_overlap
# With a three-fold partition
myfolds <- list(c(1, 2, 3), c(4, 5, 6, 7), c(8, 9, 10))
cv_3fold <- cv(model = model1, folds = myfolds, type = "UK",
trend.reestim = TRUE, fast = TRUE, light = FALSE)
cv_3fold
plotCVmean(cv_3fold)
plotCVsd(cv_3fold, ylim = ylim)
par(opar)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.