validate.predictions: Summary Statistics of (Cross-)Validation Prediction Errors

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Functions to compute and plot summary statistics of prediction errors to (cross-)validate fitted spatial linear models by the criteria proposed by Gneiting et al. (2007) for assessing probabilistic forecasts.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
validate.predictions(data, pred, se.pred, 
    statistic = c("pit", "mc", "bs", "st"), ncutoff = NULL)
    
## S3 method for class 'cv.georob'
plot(x, type = c("sc", "lgn.sc", "ta", "qq", "pit", "mc", "bs"), 
    ncutoff = NULL, add = FALSE, col, pch, lty, main, xlab, ylab, ...)
 
## S3 method for class 'cv.georob'
print(x, digits = max(3, getOption("digits") - 3), ...)   
 
## S3 method for class 'cv.georob'
rstudent(model, ...)   
 
## S3 method for class 'cv.georob'
summary(object, se = FALSE, ...)   

Arguments

data

a numeric vector with observations about a response.

pred

a numeric vector with predictions for the response.

se.pred

a numeric vector with prediction standard errors.

statistic

character keyword defining what statistic of the prediction errors should be computed. Possible values are (see Details):

  • "pit": probability integral transform (default),

  • "mc": average predictive distribution (marginal calibration),

  • "bs": Brier score,

  • "st": mean and dispersion statistics of (standardized) prediction errors.

ncutoff

positive integer (N) giving the number of quantiles, for which CDFs are evaluated (type = "mc"), or the number of thresholds for which the Brier score is computed (type = "bs"), see Details (default: min(500, length(data))).

x, model, object

objects of class cv.georob.

digits

positive integer indicating the number of decimal digits to print.

type

character keyword defining what type of plot is created by the plot.cv.georob. Possible values are:

  • "sc": a scatterplot of the (possibly log-transformed) response vs. the respective predictions (default).

  • "lgn.sc": a scatterplot of the untransformed response against back-
    transformed predictions of the log-transformed response.

  • "ta": Tukey-Anscombe plot (plot of standardized prediction errors vs. predictions).

  • "qq": normal QQ plot of standardized prediction errors.

  • "pit": histogram of probability integral transform, see Details.

  • "mc": a marginal calibration plot, see Details,

  • "bs": a plot of Brier score vs. threshold, see Details.

se

logical controlling if the standard errors of the averaged continuous ranked probability score and of the mean and dispersion statistics of the prediction errors (see Details) are computed from the respective values computed for the K cross-validation subsets (default: FALSE).

add

logical controlling wether the current high-level plot is added to an existing graphics without cleaning the frame before (default: FALSE).

main, xlab, ylab

title and axes labels of plot.

col, pch, lty

color, symbol and line type.

...

additional arguments passed to the methods.

Details

validate.predictions computes the items required to evaluate (and plot) the diagnostic criteria proposed by Gneiting et al. (2007) for assessing the calibration and the sharpness of probabilistic predictions. To this aim, validate.predictions uses the assumption that the prediction errors Y(s)-hatY(s) follow normal distributions with zero mean and standard deviations equal to the kriging standard errors. This assumption is used to compute

Gneiting et al. (2007) proposed the following plots to validate probabilistic predictions:

The plot method for class cv.georob allows to create these plots, along with scatterplots of observations and predictions, Tukey-Anscombe and normal QQ plots of the standardized prediction errors.

summary.cv.georob computes the mean and dispersion statistics of the (standardized) prediction errors (by a call to validate.prediction with argument statistic = "st", see Value) and the averaged continuous ranked probability score (crps). If present in the cv.georob object, the error statistics are also computed for the errors of the unbiasedly back-transformed predictions of a log-transformed response. If se is TRUE then these statistics are evaluated separately for the K cross-validation subsets and the standard errors of the means of these statistics are returned as well.

The print method for class cv.georob returns the mean and dispersion statistics of the (standardized) prediction errors.

The method rstudent returns for class cv.georob the standardized prediction errors.

Value

Depending on the argument statistic, the function validate.predictions returns

The function rstudent.cv.georob returns a numeric vector with the standardized cross-validation prediction errors.

Author(s)

Andreas Papritz andreas.papritz@env.ethz.ch

References

Gneiting, T., Balabdaoui, F. and Raftery, A. E.(2007) Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society Series B 69, 243–268.

See Also

georob for (robust) fitting of spatial linear models; cv.georob for assessing the goodness of a fit by georob.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## Not run: 
data( meuse )

r.logzn <- georob(log(zinc) ~ sqrt(dist), data = meuse, locations = ~ x + y,
    variogram.model = "exponential",
    param = c( variance = 0.15, nugget = 0.05, scale = 200 ),
    tuning.psi = 1)

r.logzn.cv.1 <- cv(r.logzn, seed = 1, lgn = TRUE )
r.logzn.cv.2 <- cv(r.logzn, formula = .~. + ffreq, seed = 1, lgn = TRUE )

summary(r.logzn.cv.1, se = TRUE)
summary(r.logzn.cv.2, se = TRUE)

op <- par(mfrow = c(2,2))
plot(r.logzn.cv.1, type = "lgn.sc")
plot(r.logzn.cv.2, type = "lgn.sc", add = TRUE, col = "red")
abline(0, 1, lty= "dotted")
plot(r.logzn.cv.1, type = "ta")
plot(r.logzn.cv.2, type = "ta", add = TRUE, col = "red")
abline(h=0, lty= "dotted")
plot(r.logzn.cv.2, type = "mc", add = TRUE, col = "red")
plot(r.logzn.cv.1, type = "bs")
plot(r.logzn.cv.2, type = "bs", add = TRUE, col = "red")
legend("topright", lty = 1, col = c( "black", "red"), bty = "n",
    legend = c("log(Zn) ~ sqrt(dist)", "log(Zn) ~ sqrt(dist) + ffreq"))
par(op)
## End(Not run)

georob documentation built on May 2, 2019, 6:53 p.m.