Forecast Verification with Cluster Analysis: The Variation

Description

A variation on cluster analysis for forecast verification as proposed by Marzban and Sandgathe (2008).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
CSIsamples(x, ...)

## Default S3 method:
CSIsamples(x, ..., xhat, nbr.csi.samples = 100, threshold = 20, 
    k = 100, width = 25, stand = TRUE, z.mult = 0, hit.threshold = 0.1, 
    max.csi.clust = 100, diss.metric = "euclidean", linkage.method = "average", 
    verbose = FALSE)

## S3 method for class 'SpatialVx'
CSIsamples(x, ..., time.point = 1, model = 1, nbr.csi.samples = 100, 
    threshold = 20, k = 100, width = 25, stand = TRUE, z.mult = 0, 
    hit.threshold = 0.1, max.csi.clust = 100, diss.metric = "euclidean", 
    linkage.method = "average", verbose = FALSE)

## S3 method for class 'CSIsamples'
summary(object, ...)

## S3 method for class 'CSIsamples'
plot(x, ...)

## S3 method for class 'summary.CSIsamples'
plot(x, ...)

## S3 method for class 'CSIsamples'
print(x, ...)

Arguments

x,xhat

default method: matrices giving the verification and forecast fields, resp.

“SpatialVx” method: x is an object of class “SpatialVx”.

plot, print methods: list object of class “CSIsamples” or “summary.CSIsamples” (in the case of plot).

object

list object of class “CSIsamples”.

nbr.csi.samples

integer giving the number of samples to take at each level of the CA.

threshold

numeric giving a value over which is to be considered an event.

k

numeric giving the value for centers in the call to kmeans.

width

numeric giving the size of the samples for each cluster sample.

stand

logical, should the data first be standardized before applying CA?

z.mult

numeric giving a value by which to multiply the z- component. If zero, then the CA is performed on locations only. Can be used to give more or less weight to the actual values at these locations.

hit.threshold

numeric between zero and one giving the threshold for the proportion of a cluster that is from the verification field vs the forecast field used for determining whether the cluster consitutes a hit (vs false alarm or miss depending).

max.csi.clust

integer giving the maximum number of clusters allowed.

diss.metric

character giving which method to use in the call to dist (which dissimilarity metric should be used?).

linkage.method

character giving the name of a linkage method acceptable to the method argument from the hclust function of package fastcluster.

time.point

numeric or character indicating which time point from the “SpatialVx” verification set to select for analysis.

model

numeric indicating which forecast model to select for the analysis.

verbose

logical, should progress information be printed to the screen?

...

Not used by CSIsamples method functions.

summary method function: the argument silent may be specified, which is a logical stating whether to print the information to the screen (FALSE) or not (TRUE). If not given, the summary information will be printed to the screen.

Not used by the plot method function.

Details

This function carries out the procedure described in Marzban and Sandgathe (2008) for verifying forecasts. Effectively, it combines the verification and forecast fields (keeping track of which values belong to which field) and applies CA to the combined field. Clusters identified with a proportion of values belonging to the verification field within a certain range (defined by the hit.threshold argument) are determined to be hits, misses or false alarms. From this information, the CSI (at each number of clusters; scale) is calculated. A sampling scheme is used to speed up the process.

The plot and summary functions all give the same information, but in different formats: i.e., CSI by number of clusters (scale).

Value

A list is returned by CSIsamples with components:

data.name

character vector giving the names of the verification and forecast fields analyzed, resp.

call

an object of class “call” giving the function call.

results

max.csi.clust by nbr.csi.samples matrix giving the caluclated CSI for each sample and iteration of CA.

The summary method function invisibly returns the same list, but with the additional component:

csi

vector of length max.csi.clust giving the sample average CSI for each iteration of CA.

The plot method functions do not return anything. Plots are created.

Note

Special thanks to Caren Marzban, marzban “at” u.washington.edu, for making the CSIsamples (originally called csi.samples) function available for use with this package.

Author(s)

Hillary Lyons, h.lyons “at” comcast.net, and modified by Eric Gilleland

References

Marzban, C., Sandgathe, S. (2008) Cluster Analysis for Object-Oriented Verification of Fields: A Variation. Mon. Wea. Rev., 136, (3), 1013–1025.

See Also

hclust, hclust, kmeans, clusterer

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## Not run: 
grid<- list( x= seq( 0,5,,100), y= seq(0,5,,100))
obj<-Exp.image.cov( grid=grid, theta=.5, setup=TRUE)
look<- sim.rf( obj)
look2 <- sim.rf( obj)

res <- CSIsamples(x=look, xhat=look2, 10, threshold=0, k=100,
                  width=2, z.mult=0, hit.threshold=0.25, max.csi.clust=75)
plot(res)
y <- summary(res)
plot(y)

## End(Not run)
## Not run: 
data(UKfcst6)
data(UKobs6)
data(UKloc)

hold <- make.SpatialVx(UKobs6, UKfcst6, thresholds=0,
    loc=UKloc, map=TRUE, field.type="Rainfall", units="mm/h",
    data.name=c("Nimrod", "obs 6", "fcst 6"))

res <- CSIsamples(hold, threshold=0, k=200, z.mult=0.3, hit.threshold=0.2,
                  max.csi.clust=150, verbose=TRUE)
plot(res)
summary(res)
y <- summary(res)
plot(y)

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.