jointProbMat.VALUE: Joint probability analysis
In SantanderMetGroup/R_VALUE: Climate data validation in the framework of the COST action VALUE

jointProbMat.VALUE

R Documentation

Joint probability analysis

Description

Computes different joint (precipitation) probabilities between stations

Usage

jointProbMat.VALUE(stationObj, predictionObj = NULL, season = c("annual",
  "DJF", "MAM", "JJA", "SON"), aggr.type = c("after", "before"),
  prob.type = c("DD", "DW", "WW", "WD"), prob = NULL, output = c("MI",
  "jointProb"), threshold = 1, max.na.prop = 0.25)

Arguments

`stationObj`	An R-VALUE object containing station data (as returned by `loadValueStations`).
`predictionObj`	A R-VALUE predictions object as loaded by `loadValuePredictions`. Default to NULL, meaning that the matrix of joint probabilities is done on the observations.
`season`	Character string indicating the target season. Accepted values are `c("annual", "DJF", "MAM", "JJA", "SON")`
`aggr.type`	Type of aggregation in the case of multiple realizations. Should the aggregation of multiple members be performed `"after"` (the default) or `"before"` computing the joint probabilities?. Ignored in the case of observations and deterministic predictions.
`prob.type`	Character vector indicating the type of probability to be computed. Currently accepted values are: `"DD"` for dry-dry, `"DW"` for dry-wet, `"WW"` for wet-wet and `"WD"` for wet-dry.
`prob`	Default to NULL (unused). Otherwise, a float number in the range (0,1) defining the quantile threshold to be calculated. If `prob` is used, the joint exceedance probabilities will be calculated (i.e., the probability of an occurrence above the `prob` percentile in location A given the same exceedance in location B).
`output`	Character string indicating the type of measure to be retained. Default to `"MI"` for the mutual information criterion. For the joint probabilities use `"jointProb"`. See details.
`threshold`	Threshold above which values are used/discarded (i.e., values greater or equal than `threshold` are considered as Wet).
`max.na.prop`	Maximum allowed proportion of missing data (Default to 0.25). See details

Details

The typical way to analyze dependencies is comparing the joint $P(wet_i,wet_j)$ (i.e. output="jointProb") and the product of marginals $P(wet_i) * P(wet_j)$. The difference is zero only in case that $wet_i$ and $wet_j$ are independent and the larger the value, the more dependent they are. Using the joint probability alone would make comparisons a bit difficult since the final result would be a combination of the dependency between both wet series and the marginal probabilities in each of the stations (e.g. the joint for web values would be smaller in dry climates), so the analysis of dependencies will be modulated by the different climatologies. In order to avoid this, the mutual information of two random variables is a measure of the mutual dependence between the two variables. MI = 0 if the two events are independent.

MI is more general and determines how similar the joint distribution p(X,Y) is to the products of factored marginal distribution p(X)p(Y). Mutual information is nonnegative (i.e. MI(X,Y) >= 0) and symmetric (i.e. MI(X,Y) = MI(Y,X)). Note that when the joint probability is zero the returned value of MI is NaN.

Value

A list of 2D matrices. The length of the list corresponds to the periods indicated in the season argument (default to 5, annual and the four standard WMO seasons). Attributes indicate the station names (in the row/column order they appear), and their geographical coordinates. In addition, the type of joint probability, as coded in argument prob.type is indicated in the global attribute "joint_prob_type".

Author(s)

J. Bedia

References

https://en.wikipedia.org/wiki/Mutual_information

Examples

## Not run: 
obs.file <- file.path(find.package("R.VALUE"), "example_datasets", "VALUE_53_ECAD_Germany_v1.zip")
stationObj <- loadValueStations(obs.file, var = "precip")
# Wet-wet probability (precip >= 1mm)
ww <- jointProbMat.VALUE(stationObj,
                        predictionObj = NULL,
                        season = "annual",
                        threshold = 1,
                        max.na.prop = 1,
                        aggr.type = "after",
                        prob.type = "WW",
                        output = "MI")

# Dry-dry joint probability (precip < 1mm)
dd <- jointProbMat.VALUE(stationObj,
                         predictionObj = NULL,
                         season = "annual",
                         threshold = 1,
                         max.na.prop = 1,
                         aggr.type = "after",
                         prob.type = "DD",
                         output = "MI")
                         
# Draw matrix - requires lattice!
mat <- matrix(ncol = ncol(ww[[1]]), nrow = nrow(ww[[1]]))
mat[upper.tri(mat)] <- ww[[1]][upper.tri(ww[[1]])]
mat[lower.tri(mat)] <- dd[[1]][upper.tri(dd[[1]])]
station.labels <- attr(ww[[1]], "station_names")
scales.list <- list(x = list(labels = station.labels, rot = 90,
                           at = seq(1,ncol(ww[[1]]),1), cex = .5),
                    y = list(labels = station.labels,
                             at = seq(1,ncol(ww[[1]]),1), cex = .5))
# lattice::levelplot(mat, xlab = "P(DRY,DRY)", ylab = "P(WET,WET)",
# main = "Mutual Information Matrix", scales = scales.list)

## End(Not run)

SantanderMetGroup/R_VALUE documentation built on July 4, 2023, 4:27 a.m.