GeoScores: Computation of predictive scores

View source: R/GeoScores.R

GeoScoresR Documentation

Computation of predictive scores

Description

The function computes predictive scores for observed validation values and predicted values, optionally using predictive variances or 95 percent prediction intervals.

Usage

GeoScores(data_to_pred,
          probject = NULL,
          pred = NULL,
          mse = NULL,
          score = c("pe", "crps", "intscore", "coverage"),
          lower95 = NULL,
          upper95 = NULL,
          threshold = 0.5,
          na.rm = TRUE)

Arguments

data_to_pred

Numeric vector, matrix or array containing the observed validation values. Values are internally coerced to a numeric vector.

probject

Optional object of class GeoKrig or GeoKrigloc. If supplied, point predictions and prediction variances are extracted from probject$pred and probject$mse, respectively.

pred

Numeric vector, matrix or array of point predictions. This argument is required when probject is not supplied.

mse

Optional numeric vector, matrix or array of prediction variances. Predictive standard errors are computed as sqrt(mse). This argument is required for Gaussian-distribution scores unless both lower95 and upper95 are supplied.

score

Character vector specifying which predictive scores should be computed. Possible values are "brie", "crps", "lscore", "pit", "pe", "intscore", and "coverage". Several values can be supplied.

lower95

Optional numeric vector, matrix or array containing the lower bounds of the 95 percent prediction intervals. If supplied together with upper95, these bounds are used directly for "intscore" and "coverage". If mse is not supplied, they are also used to approximate predictive standard errors for Gaussian-distribution scores.

upper95

Optional numeric vector, matrix or array containing the upper bounds of the 95 percent prediction intervals. See lower95.

threshold

Numeric threshold used to compute the Brier score "brie". The binary event is Y > threshold.

na.rm

Logical. If TRUE, non-finite observations, predictions, invalid intervals and non-positive predictive variances or standard errors are removed before computing the scores. If FALSE, such values produce an error.

Details

GeoScores computes point prediction scores and probabilistic predictive scores. Predictions and prediction variances can be supplied directly through pred and mse, or indirectly through a GeoKrig or GeoKrigloc object supplied to probject.

The point prediction option "pe" returns the mean absolute error and the root mean squared prediction error.

The scores "brie", "lscore", "pit" and "crps" are computed under the assumption that the predictive distribution is Gaussian, with mean equal to the point prediction and standard deviation equal to the predictive standard error. If mse is supplied, the predictive standard error is sqrt(mse). If mse is absent and both lower95 and upper95 are supplied, the predictive standard error is approximated as

\frac{U - L}{2 \Phi^{-1}(0.975)},

where L and U are the lower and upper bounds of the 95 percent prediction interval.

The Brier score is computed for the binary event Y > threshold, using the Gaussian predictive probability 1 - \Phi((threshold - \hat{Y}) / s), where \hat{Y} is the prediction and s is the predictive standard error.

The logarithmic score is returned as the negative log predictive density, averaged over validation observations:

\frac{1}{2}\left\{\log(2\pi\sigma^2) + \left(\frac{y - \hat{y}}{\sigma}\right)^2\right\}.

The CRPS is computed using the closed-form expression for a Gaussian predictive distribution.

For "pit", if probject is a GeoKrig or GeoKrigloc object and GeoPit is available, the PIT values are obtained through GeoPit(probject, type = "Uniform")$data; otherwise they are computed as pnorm(data_to_pred, mean = pred, sd = sqrt(mse)).

The 95 percent interval score is computed as

(u - l) + \frac{2}{\alpha}(l - y) I(y < l) + \frac{2}{\alpha}(y - u) I(y > u),

with \alpha = 0.05. If lower95 and upper95 are supplied, they are used directly as l and u; otherwise the interval is computed from the Gaussian predictive distribution using pred and mse. The empirical coverage is the proportion of validation values falling inside the corresponding 95 percent intervals.

Value

A list containing only the components requested through score. Possible components are:

MAE

Mean absolute error. Returned when "pe" is requested.

RMSPE

Root mean squared prediction error. Returned when "pe" is requested.

Brier

Brier score for the binary event Y > threshold. Returned when "brie" is requested.

LogScore

Mean negative log predictive density under a Gaussian predictive distribution. Returned when "lscore" is requested.

PIT

Probability integral transform values. Returned when "pit" is requested.

CRPS

Mean continuous ranked probability score under a Gaussian predictive distribution. Returned when "crps" is requested.

IS95

Mean interval score for the 95 percent prediction intervals. Returned when "intscore" is requested.

Cvg95

Empirical coverage of the 95 percent prediction intervals. Returned when "coverage" is requested.

Author(s)

Moreno Bevilacqua, moreno.bevilacqua89@gmail.com, https://sites.google.com/view/moreno-bevilacqua/home,

Víctor Morales Oñate, victor.morales@uv.cl, https://sites.google.com/site/moralesonatevictor/,

Christian Caamaño-Carrillo, chcaaman@ubiobio.cl, https://www.researchgate.net/profile/Christian-Caamano

References

Gneiting, T. and Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102, 359–378.

Heaton, M. J., Datta, A., Finley, A. O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M., Lindgren, F., Nychka, D. W., Sun, F., and Zammit-Mangion, A. (2019). A Case Study Competition Among Methods for Analyzing Large Spatial Data. Journal of Agricultural, Biological, and Environmental Statistics, 24, 398–425.

Examples

library(GeoModels)

################################################################
######### Example of predictive score computation  #############
################################################################

model <- "Gaussian"
set.seed(79)
N <- 1000

x <- runif(N, 0, 1)
y <- runif(N, 0, 1)
coords <- cbind(x, y)

# Set covariance parameters
corrmodel <- "GenWend"
mean <- 0
sill <- 5
nugget <- 0
scale <- 0.2
smooth <- 0
power2 <- 4

param <- list(mean = mean, sill = sill, nugget = nugget,
              scale = scale, smooth = smooth, power2 = power2)

# Simulation of the spatial Gaussian random field
data <- GeoSim(coordx = coords, corrmodel = corrmodel,
               param = param)$data

# Training and validation split
sel <- sample(1:N, N * 0.8)

coords_est <- coords[sel, ]
coords_to_pred <- coords[-sel, ]

data_est <- data[sel]
data_to_pred <- data[-sel]

# Pairwise likelihood fitting
fixed <- list(nugget = nugget, smooth = smooth, power2 = power2)

start <- list(mean = 0, scale = scale, sill = 1)

I <- Inf
lower <- list(mean = -I, scale = 0, sill = 0)
upper <- list(mean =  I, scale = I, sill = I)

fit <- GeoFit(data_est, coordx = coords_est,
              corrmodel = corrmodel, model = model,
              likelihood = "Marginal", type = "Pairwise",
              neighb = 3, optimizer = "nlminb",
              lower = lower, upper = upper,
              start = start, fixed = fixed)

# Prediction at validation locations
pr <- GeoKrig(fit,loc = coords_to_pred,  data = data_est, mse = TRUE)

# Predictive scores from pred and mse
Pr_scores <- GeoScores(data_to_pred, pred = pr$pred, mse = pr$mse,
                       score = c("pe", "brie", "crps", "lscore",
                                 "pit", "intscore", "coverage"),
                       threshold = 0)

Pr_scores$MAE
Pr_scores$RMSPE
Pr_scores$Brier
Pr_scores$CRPS
Pr_scores$LogScore
Pr_scores$IS95
Pr_scores$Cvg95


GeoModels documentation built on June 15, 2026, 9:07 a.m.