miMat.VALUE: Mutual Information Matrix
In SantanderMetGroup/R_VALUE: Climate data validation in the framework of the COST action VALUE

miMat.VALUE

R Documentation

Mutual Information Matrix

Description

Computes the mutual information between stations regarding different aspects of precipitation

Usage

miMat.VALUE(stationObj, predictionObj = NULL, season = c("annual", "DJF",
  "MAM", "JJA", "SON"), aggr.type = c("after", "before"), prob = NULL,
  threshold = 1, max.na.prop = 0.25)

Arguments

`stationObj`	An R-VALUE object containing station data (as returned by `loadValueStations`).
`predictionObj`	A R-VALUE predictions object as loaded by `loadValuePredictions`. Default to NULL, meaning that the matrix of joint probabilities is done on the observations.
`season`	Character string indicating the target season. Accepted values are `c("annual", "DJF", "MAM", "JJA", "SON")`
`aggr.type`	Type of aggregation in the case of multiple realizations. Should the aggregation of multiple members be performed `"after"` (the default) or `"before"` computing the joint probabilities?. Ignored in the case of observations and deterministic predictions.
`prob`	Default to NULL (unused). Otherwise, a float number in the range (0,1) defining the quantile threshold to be calculated. If `prob` is used, the joint exceedance probabilities will be calculated (i.e., the probability of an occurrence above the `prob` percentile in location A given the same exceedance in location B...).
`threshold`	Threshold above which values are used/discarded (i.e., values greater or equal than `threshold` are considered as Wet). Default to 1 (assuming mm)
`max.na.prop`	Maximum allowed proportion of missing data (Default to 0.25). See details

Details

The typical way to analyze dependencies is comparing the joint $P(wet_i,wet_j)$ (i.e. output="jointProb") and the product of marginals $P(wet_i) * P(wet_j)$. The difference is zero only in case that $wet_i$ and $wet_j$ are independent and the larger the value, the more dependent they are. Using the joint probability alone would make comparisons a bit difficult since the final result would be a combination of the dependency between both wet series and the marginal probabilities in each of the stations (e.g. the joint for web values would be smaller in dry climates), so the analysis of dependencies will be modulated by the different climatologies. In order to avoid this, the mutual information of two random variables is a measure of the mutual dependence between the two variables. MI = 0 if the two events are independent.

MI is more general and determines how similar the joint distribution p(X,Y) is to the products of factored marginal distribution p(X)p(Y). Mutual information is nonnegative (i.e. MI(X,Y) >= 0) and symmetric (i.e. MI(X,Y) = MI(Y,X)). Note that when the joint probability is zero the returned value of MI is NaN.

Value

A list of 2D matrices. The length of the list corresponds to the periods indicated in the season argument (default to 5, annual and the four standard WMO seasons). Attributes indicate the station names (in the row/column order they appear), and their geographical coordinates.

Author(s)

J. Bedia

References

https://en.wikipedia.org/wiki/Mutual_information

Examples

## Not run: 
obs.file <- file.path(find.package("R.VALUE"), "example_datasets", "VALUE_53_ECAD_Germany_v1.zip")
stationObj <- loadValueStations(obs.file, var = "precip")
# Mutual information (By default, computes the
# joint probabilities for each pair of stations for Dry-Dry, Dry-Wet,
# Wet-Wet and Wet-Dry), and applies the MI formula (See References)

mi.matrix <- miMat.VALUE(stationObj,
                  predictionObj = NULL,
                  season = "annual",
                  threshold = 1,
                  max.na.prop = 1)
                         
# Draw matrix - requires lattice!
station.labels <- attr(mi.matrix[[1]], "station_names")
scales.list <- list(x = list(labels = station.labels, rot = 90,
                           at = seq(1,ncol(mi.matrix[[1]]),1), cex = .5),
                    y = list(labels = station.labels,
                             at = seq(1,ncol(mi.matrix[[1]]),1), cex = .5))
# requires lattice package
# lattice::levelplot(mi.matrix[[1]], ylab = "", xlab = "",
#                      main = "Mutual Information Matrix", scales = scales.list)

# If 'prob' is given, then the same as before, but considering
# threshold exceedances instead of occurrence. Example using the threshold exceedance p90:

mi.matrix.p90 <- miMat.VALUE(stationObj,
                             predictionObj = NULL,
                             season = "annual",
                             threshold = 1,
                             max.na.prop = 1,
                             prob = .9)
                             
# requires lattice package
# lattice::levelplot(mi.matrix.p90[[1]], ylab = "", xlab = "",
                     main = "Mutual Information Matrix", scales = scales.list)

## End(Not run)

SantanderMetGroup/R_VALUE documentation built on July 4, 2023, 4:27 a.m.