mvr.hist: Multivariate Verification Rank Histogram

mvr.histR Documentation

Multivariate Verification Rank Histogram

Description

This function plots the Multivariate Verification Rank Histogram (MVRH) given observations of a multivariate variable and samples of a predictive distribution.

Usage

mvr.hist(
  y,
  x,
  method = "mv",
  type = "relative",
  bins = NULL,
  title = NULL,
  reliability = FALSE,
  entropy = FALSE,
  na.rm = FALSE
)

Arguments

y

matrix of observations (see details)

x

3-dimensional array of samples of a predictive distribution (depending on y; see details)

method

character; "mv", "avg", "mst", "bd"; default: "mv" (see details)

type

character; "relative", "absolute" and "density"; default: "relative" (see details)

bins

numeric; if NULL the number of bins is equal to nrow(x[, , 1])+1; otherwise bins must be chosen so that (nrow(x[, , 1])+1)/bins is an integer; default: NULL (see details)

title

character; title of the plot; default: "Multivariate Verification Rank Histogram"

reliability

logical; if TRUE the multivariate reliability index is calculated for the plot (see details); if FALSE the multivariate reliability index is not calculated; default: FALSE

entropy

logical; if TRUE the entropy is calculated for the plot (see details); if FALSE the entropy is not calculated; default: FALSE

na.rm

logical; if TRUE NA are stripped before the rank computation proceeds; if FALSE NA are used in the rank computation; default: FALSE

Details

The observations are given in the matrix y with n rows, where each column belongs to an univariate observation variable. The i-th row of matrix y belongs to the i-th third dimension entry of the array x. The i-th third dimension entry must be a matrix with n rows, having the same structure as y, filled with the samples of a multivariate predictive distribution.

The parameter bins specifies the number of columns for the MVRH. For "large" ncol(x[, , 1]) it is often reasonable to reduce the resolution of the MVRH by using bins so that (ncol(x[, , 1])+1)/bins is an integer.

For the calculation of the ranks, different methods are available, where "mv" stands for "multivariate ranks", "avg" stands for "average ranks", "mst" stands for "minimum-spanning-tree ranks" and "bd" stands for "band-depth ranks". These methods are implemented as described in e.g. Thorarinsdottir et al. (2016).

If type is "relative" the relative frequencies of the bins are plotted. If type is "absolute" the absolute frequencies of the bins are plotted. If type is "density" the relative densities of the bins are plotted.

An uniform MVRH indicates a calibrated predictive distribution. Depending on the chosen method, we have the following interpretation:

  • "mv" and "avg": A ∩-shape in the MVRH indicates overdispersion and a ∪-shape indicates underdispersion of the predictive distribution. A systematic bias of the predictive distribution results in a triangular shaped MVRH histogram.

  • "mst" and "bd": Too many low ranks indicate underdispersion or bias of the predictive distribution. Too many high ranks indicate overdispersion or bias of the predictive distribution.

The deviation from uniformity of the MVRH can be quantified by the multivariate reliability index (RI). The smaller the RI, the better is the calibration of the forecast. The optimal value of the RI is 0.

The entropy is a tool to assess the calibration of a forecast. The optimal value of the entropy is 1, representing a calibrated forecast.

Value

ggplot object with a plot of the Multivariate Verification Rank Histogram.

Author(s)

David Jobst

References

Delle Monache, L., Hacker, J., Zhou, Y., Deng, X. and Stull, R., (2006). Probabilistic aspects of meteorological and ozone regional ensemble forecasts. Journal of Geophysical Research: Atmospheres, 111, D24307.

Gneiting, T., Stanberry, L., Grimit, E., Held, L. and Johnson, N. (2008). Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. Test, 17, 211-264.

Smith, L. and Hansen, J. (2004). Extending the limits of ensemble forecast verification with the minimum spanning tree. Monthly Weather Review, 132, 1522-1528.

Taillardat, M., Mestre, O., Zamo, M. and Naveau, P., (2016). Calibrated Ensemble Forecasts Using Quantile Regression Forests and Ensemble Model Output Statistics. American Meteorological Society, 144, 2375-2393.

Thorarinsdottir, T., Scheurer, M. and Heinz, C. (2016). Assessing the calibration of high-dimensional ensemble forecasts using rank histograms. Journal of Computational and Graphical Statistics, 25, 105-122.

Tribus, M. (1969). Rational Descriptions, Descisions and Designs. Pergamon Press.

Wilks, D. (2004). The minimum spanning tree histogram as verification tool for multidimensional ensemble forecasts. Monthly Weather Review, 132, 1329-1340.

Examples

# simulated data
n <- 30
m <- 50
y <- cbind(rnorm(n), rgamma(n, shape = 1))
x <- array(NA, dim = c(m, 2, n))
x[, 1, ] <- rnorm(n*m)
x[, 2, ] <- rgamma(n*m, shape = 1)

# mvr.hist plot
mvr.hist(y = y, x = x)
mvr.hist(y = y, x = x, bins = 17, title = "MVRH",
reliability = TRUE, entropy = FALSE)
mvr.hist(y = y, x = x, bins = 3, method = "avg", type = "absolute",
reliability = FALSE, entropy = TRUE)
mvr.hist(y = y, x = x, bins = 3, method = "bd", type = "density",
reliability = TRUE, entropy = TRUE)


jobstdavid/eppverification documentation built on May 13, 2024, 5:20 p.m.