gmm2d: 2-d Gaussian Mixture Models Verification

View source: R/gmm.R

gmm2dR Documentation

2-d Gaussian Mixture Models Verification

Description

Use 2-d Gaussian Mixture Models (GMM) to assess forecast performance.

Usage

gmm2d(x, ...)

## Default S3 method:
gmm2d(x, ..., xhat, K = 3, gamma = 1, threshold = NULL,
    initFUN = "initGMM", verbose = FALSE)

## S3 method for class 'SpatialVx'
gmm2d(x, ..., time.point = 1, obs = 1, model = 1, K = 3, gamma = 1,
    threshold = NULL, initFUN = "initGMM", verbose = FALSE)

## S3 method for class 'gmm2d'
plot(x, ..., col = c("gray", tim.colors(64)),
    zlim = c(0, 1), horizontal = TRUE)

## S3 method for class 'gmm2d'
predict(object, ..., x)

## S3 method for class 'gmm2d'
print(x, ...)

## S3 method for class 'gmm2d'
summary(object, ...)

Arguments

x,xhat

Default: m by n numeric matrices giving the verification and forecast fields, resp.

gmm2d.SpatialVx: object of class “SpatialVx”.

plot and print: object returned by gmm2d.

predict: k by 2 matrix of lon/lat coordinates on which to predict the model.

object

output from gmm2d.

K

single numeric giving the number of mixture components to use.

gamma

Value of the gamma parameter from Eq (11) of Lakshmanan and Kain (2010). This affects the number of times a location is repeated.

threshold

numeric giving a threshold over which (and including) the GMM is to be fit (zero-valued grid points are not included in the estimation here for speed). If NULL, no thresholding is applied.

initFUN

character naming a function to provide initial estimates for the GMM. Must take an m by n matrix as input, and return a dataframe a component called ind that is a vector indicating the order of the rows for which the first K will be used, a third column giving the x-coordinates of the initial estimate of the mean for the x direction, fourth column giving the initial estimate for the mean of the y-direction, and fifth and sixth columns giving initial estimates for the standard deviations of the x- and y-directions. The default identifies all connected components using the disjointer function, then uses their centroids as the initial estimates of the means, and their axes as initial estimates for the standard deviations. The ind component gives the order of the object areas from largest to smallest so that the K largest objects are used to provide initial estimates. Note that this differs from the initial estimates in Lakshmanan and Kain (2010) where they break the field into different areas first.

verbose

logical, should progress information be printed to the screen?

time.point

numeric or character indicating which time point from the “SpatialVx” verification set to select for analysis.

obs, model

numeric indicating which observation/forecast model to select for the analysis.

col, zlim, horizontal

optional arguments to fields function(s) poly.image, image.plot.

...

In the case of gmm2d: optional arguments to initFUN. In the case of plot: not used. In the case of predict: N by 2 matrix of grid point locations on which to predict the probability from the 2-d GMM model. In the case of summary: this can include the arguments: silent, logical stating whether to print summaries to the screen (FALSE) or not (TURE), e1, e2, ..., e5, giving alternative weights in calculating the overall error (Eq 15 in Lakshmanan and Kain, 2010, but see details section below).

Details

These functions carry out the spatial verification approach described in Lakshmanan and Kain (2010), which fits a 2-d Gaussian Mixture Model (GMM) to the locations for each field in the verification set, and makes comparisons using the estimated parameters. In fitting the GMMs, first an initial estimate is provided by using the initFUN argument, which is a function. The default function is relatively fast (it might seem slow, but for what it does, it is very fast!), but is typically the slowest part of the process. Although the EM algorithm is a fairly computationally intensive procedure, acceleration algorithms are employed (via the turboem function of the turboEM package) so that once initial estimates are found, the procedure is very fast.

Because the fit is to the locations only, Lakshmanan and Kain (2010) suggest two ways to incorporate intensity information. The first is to repeat points with higher intensities, and the second is to multiply the results by the total intensities over the fields. The points are repeated M times according to the formula (Eq 11 in Lakshmanan and Kain, 2010):

M = 1 + gamma * round( CFD(I_xy)/frequency(I_MODE)),

where CFD is the cumulative *frequency* distribution (here estimated from the histogram using the ‘hist’ function), I_xy is intensity at grid point (x,y), I_MODE is the mode of intensity values, and gamma is a user-supplied parameter controlling how much to repeat points where higher numbers will result in larger repetitions of high intensity values.

The function gmm2d fits the 2-d GMM to both fields, plot.gmm2d first uses predict.gmm2d to obtain probabilities for each grid point, and then makes a plot similar to those in Lakshmanan and Kain (2010) Figs. 3, 4 and 5, but giving the probabilities instead of the probabilities times A. Note that predict.gmm2d can be very slow to compute so that plot.gmm2d can also be very slow. Less effort was put into speeding these functions up because they are not necessary for obtaining results via the parameters. However, they can give the user an idea of how good the fit is.

The 2-d GMM is given by

G(x,y) = A*sum(lambda*f(x,y))

where lambda and f(x,y) are numeric vectors of length K, lambda components describe the mixing, and f(x,y) is the bivariate normal distribution with mean (mu.x, mu.y) and covariance function. ‘A’ is the total sum of intensities over the field.

Comparisons between forecast and observed fields are carried out finally by the summary method function. In particular, the translation error

e.tr = sqrt((mu.xf - mu.xo)^2 + (mu.yf - mu.yo)^2),

where f means forecast and o verification fields, resp., and mu .x is the mean in the x- direction, and mu.y in the y- direction. The rotation error is given by

e.rot = (180/pi)*acos(theta),

where theta is the dot product between the first eigenvectors of the covariance matrices for the verification and forecast fields. The scaling error is given by

e.sc = Af*lambda.f/Ao*lambda.o,

where lambda is the mixture component and Af/Ao is the forecast/observed total intensity.

The overall error (Eq 15 of Lakshmanana and Kain, 2010) is given by

e.overall = e1 * min(e.tr/e2, 1) + e3*min(e.rot,180 - e.rot)/e4 + e5*(max(e.sc,1/e.sc)-1),

where e1 to e5 can be supplied by the user, but the defaults are those given by Lakshmanan and Kain (2010). Namely, e1 = 0.3, e2 = 100, e3=0.2, e4 = 90, and e5=0.5.

Value

For gmm2d, a list object of class “gmm2d” is returned with components:

fitX,fitY

list objects returned by the turboem function from the turboEM package that describe the EM estimates of the 2-d GMM parameters for the verification and forecast fields, resp.

initX,initY

numeric vectors giving the initial estimates used in the EM algorithm for the verification and forecast fields, resp. The first 2*K values are the initial mean estiamtes for the x- and y- directions, resp. The next 4*K values are the initial estiamtes of the covariances (note that the cross-covariance terms are zero regardless of initialization function employed (maybe this will be improved in the future). The final K values are the initial estimates for lambda.

sX,sY

N by 2 matrix giving the repeated coordinates calculated per M as described in the details section for the verification and forecast fields, resp.

k

single numeric giving the value of K

Ax,Ay

single numerics giving the value of A (the total sum of intensities over the field) for the verifiaction and forecast fields, resp.

For plot.gmm2d no value is returned. A plot is created.

For predict.gmm2d, a list is returned with components:

predX,predY

numeric vectors giving the GMM predicted values for the verification and forecast fields, resp.

For summary.gmm2d, a list is returned invisibly (if silent is FALSE, information is printed to the screen) with components:

meanX,meanY

Estimated mean vectors for each GMM component for the verification and forecast fields, resp.

covX,covY

Estimated covariances for each GMM component for the verification and forecast fields, resp.

lambdasX,lambdasY

Estimated mixture components for each GMM component for the verification and forecast fields, resp.

e.tr,e.rot,e.sc,e.overall

K by K matrices giving the errors between each GMM component in the verification field (rows) to each GMM component in the forecast field (columns). The errors are: translation (e.tr), rotation (e.rot), scaling (e.sc), and overall (e.overall).

Author(s)

Eric Gilleland

References

Lakshmanan, V. and Kain, J. S. (2010) A Gaussian Mixture Model Approach to Forecast Verification. Wea. Forecasting, 25 (3), 908–920.

See Also

turboem, disjointer, connected

Examples

## Not run: 
data( "ExampleSpatialVxSet" )

x <- ExampleSpatialVxSet$vx
xhat <- ExampleSpatialVxSet$fcst

u <- min(quantile(c(x[x > 0]), probs = 0.75),
    quantile(c(xhat[xhat > 0]), probs = 0.75))

look <- gmm2d(x, xhat=xhat, threshold=u, verbose=TRUE)
summary(look)
plot(look)

## End(Not run)
## Not run: 
# Alternative method to skin the cat.
hold <- make.SpatialVx( x, xhat, field.type = "MV Gaussian w/ Exp. Cov.",
    units = "units", data.name = "Example", obs.name = "x",
    model.name = "xhat" )

look2 <- gmm2d( hold, threshold = u, verbose = TRUE)
summary(look2)
plot(look2)


## End(Not run)

SpatialVx documentation built on Nov. 10, 2022, 5:56 p.m.