bme_cv: Leave-one-out cross validation (LOOCV) at hard data...

View source: R/bme_cv.R

bme_cvR Documentation

Leave-one-out cross validation (LOOCV) at hard data locations.

Description

bme_cv performs LOOCV to evaluate the prediction performance of the Bayesian Maximum Entropy (BME) spatial interpolation method using both hard and soft (interval) data.

For each hard data location, the function removes the observed value and predicts it using all remaining hard and soft data points. This is repeated for every hard data location. The predictions are either posterior means or posterior modes, depending on the type argument.

The function returns prediction results at each location, including the residuals (differences between observed and predicted values), and computes three performance metrics:

  • ME (Mean Error) – measures prediction bias.

  • MAE (Mean Absolute Error) – measures average magnitude of prediction error.

  • RMSE (Root Mean Squared Error) – emphasizes larger errors and reflects prediction accuracy.

This function is useful for validating the BME interpolation method and tuning variogram parameters.

Usage

bme_cv(ch, cs, zh, a, b,
       model, nugget, sill, range, nsmax = 5,
       nhmax = 5, n = 50, zk_range = extended_range(zh, a, b),
       type)

Arguments

ch

A matrix of spatial coordinates for hard data locations (each row is a location).

cs

A matrix of spatial coordinates for soft (interval) data locations.

zh

A numeric vector of observed values at the hard data locations.

a

A numeric vector of lower bounds for the soft interval data.

b

A numeric vector of upper bounds for the soft interval data.

model

A string specifying the variogram or covariance model to use (e.g., "exp", "sph", etc.).

nugget

A non-negative numeric value for the nugget effect in the variogram model.

sill

A numeric value representing the sill (total variance) in the variogram model.

range

A positive numeric value for the range (or effective range) parameter of the variogram model.

nsmax

An integer specifying the maximum number of nearby soft data points to include for estimation (default is 5).

nhmax

An integer specifying the maximum number of nearby hard data points to include for estimation (default is 5).

n

An integer indicating the number of points at which to evaluate the posterior density over zk_range (default is 50).

zk_range

A numeric vector specifying the range over which to evaluate the unobserved value at the estimation location (zk). Although zk is unknown, it is assumed to lie within a range similar to the observed data (zh, a, and b). It is advisable to explore the posterior distribution at a few locations using prob_zk() before finalizing this range. The default is extended_range(zh, a, b).

type

A string indicating the type of BME prediction to compute: either "mean" for the posterior mean or "mode" for the posterior mode.

Value

A list with two elements:

results

A data frame containing the coordinates, observed values, BME predictions (posterior mean or mode), posterior variance (if type = "mean"), residuals, and fold indices.

metrics

A one-row data frame reporting the mean error (ME), mean absolute error (MAE), and root mean squared error (RMSE) from the cross-validation.

Examples

data("utsnowload")
ch <- utsnowload[2:10, c("latitude", "longitude")]
cs <- utsnowload[68:232, c("latitude", "longitude")]
zh <- utsnowload[2:10, c("hard")]
a <- utsnowload[68:232, c("lower")]
b <- utsnowload[68:232, c("upper")]
bme_cv(ch, cs, zh, a, b, model = "exp", nugget = 0.0953, sill = 0.3639,
       range = 1.0787, type = "mean")


BMEmapping documentation built on July 2, 2025, 9:07 a.m.