jointProbMat.VALUE: Joint probability analysis

View source: R/jointProbMat.VALUE.R

jointProbMat.VALUER Documentation

Joint probability analysis

Description

Computes different joint (precipitation) probabilities between stations

Usage

jointProbMat.VALUE(stationObj, predictionObj = NULL, season = c("annual",
  "DJF", "MAM", "JJA", "SON"), aggr.type = c("after", "before"),
  prob.type = c("DD", "DW", "WW", "WD"), prob = NULL, output = c("MI",
  "jointProb"), threshold = 1, max.na.prop = 0.25)

Arguments

stationObj

An R-VALUE object containing station data (as returned by loadValueStations).

predictionObj

A R-VALUE predictions object as loaded by loadValuePredictions. Default to NULL, meaning that the matrix of joint probabilities is done on the observations.

season

Character string indicating the target season. Accepted values are c("annual", "DJF", "MAM", "JJA", "SON")

aggr.type

Type of aggregation in the case of multiple realizations. Should the aggregation of multiple members be performed "after" (the default) or "before" computing the joint probabilities?. Ignored in the case of observations and deterministic predictions.

prob.type

Character vector indicating the type of probability to be computed. Currently accepted values are: "DD" for dry-dry, "DW" for dry-wet, "WW" for wet-wet and "WD" for wet-dry.

prob

Default to NULL (unused). Otherwise, a float number in the range (0,1) defining the quantile threshold to be calculated. If prob is used, the joint exceedance probabilities will be calculated (i.e., the probability of an occurrence above the prob percentile in location A given the same exceedance in location B).

output

Character string indicating the type of measure to be retained. Default to "MI" for the mutual information criterion. For the joint probabilities use "jointProb". See details.

threshold

Threshold above which values are used/discarded (i.e., values greater or equal than threshold are considered as Wet).

max.na.prop

Maximum allowed proportion of missing data (Default to 0.25). See details

Details

The typical way to analyze dependencies is comparing the joint $P(wet_i,wet_j)$ (i.e. output="jointProb") and the product of marginals $P(wet_i) * P(wet_j)$. The difference is zero only in case that $wet_i$ and $wet_j$ are independent and the larger the value, the more dependent they are. Using the joint probability alone would make comparisons a bit difficult since the final result would be a combination of the dependency between both wet series and the marginal probabilities in each of the stations (e.g. the joint for web values would be smaller in dry climates), so the analysis of dependencies will be modulated by the different climatologies. In order to avoid this, the mutual information of two random variables is a measure of the mutual dependence between the two variables. MI = 0 if the two events are independent.

MI is more general and determines how similar the joint distribution p(X,Y) is to the products of factored marginal distribution p(X)p(Y). Mutual information is nonnegative (i.e. MI(X,Y) >= 0) and symmetric (i.e. MI(X,Y) = MI(Y,X)). Note that when the joint probability is zero the returned value of MI is NaN.

Value

A list of 2D matrices. The length of the list corresponds to the periods indicated in the season argument (default to 5, annual and the four standard WMO seasons). Attributes indicate the station names (in the row/column order they appear), and their geographical coordinates. In addition, the type of joint probability, as coded in argument prob.type is indicated in the global attribute "joint_prob_type".

Author(s)

J. Bedia

References

https://en.wikipedia.org/wiki/Mutual_information

Examples

## Not run: 
obs.file <- file.path(find.package("R.VALUE"), "example_datasets", "VALUE_53_ECAD_Germany_v1.zip")
stationObj <- loadValueStations(obs.file, var = "precip")
# Wet-wet probability (precip >= 1mm)
ww <- jointProbMat.VALUE(stationObj,
                        predictionObj = NULL,
                        season = "annual",
                        threshold = 1,
                        max.na.prop = 1,
                        aggr.type = "after",
                        prob.type = "WW",
                        output = "MI")

# Dry-dry joint probability (precip < 1mm)
dd <- jointProbMat.VALUE(stationObj,
                         predictionObj = NULL,
                         season = "annual",
                         threshold = 1,
                         max.na.prop = 1,
                         aggr.type = "after",
                         prob.type = "DD",
                         output = "MI")
                         
# Draw matrix - requires lattice!
mat <- matrix(ncol = ncol(ww[[1]]), nrow = nrow(ww[[1]]))
mat[upper.tri(mat)] <- ww[[1]][upper.tri(ww[[1]])]
mat[lower.tri(mat)] <- dd[[1]][upper.tri(dd[[1]])]
station.labels <- attr(ww[[1]], "station_names")
scales.list <- list(x = list(labels = station.labels, rot = 90,
                           at = seq(1,ncol(ww[[1]]),1), cex = .5),
                    y = list(labels = station.labels,
                             at = seq(1,ncol(ww[[1]]),1), cex = .5))
# lattice::levelplot(mat, xlab = "P(DRY,DRY)", ylab = "P(WET,WET)",
# main = "Mutual Information Matrix", scales = scales.list)

## End(Not run)

SantanderMetGroup/R_VALUE documentation built on July 4, 2023, 4:27 a.m.