Validation of Hierarchical Climate Regionalization
Description
validClimR
computes indices for cluster validation, and an
objective tree cut for regional
linkage custering method.
Usage
1 2 
Arguments
y 
a dendrogram tree produced by 
k 

minSize 
minimum cluster size. The 
alpha 
confidence level: the default is 
verbose 
logical to print processing information if 
plot 
logical to call the plotting method if 
colPalette 
a color palette or a list of colors such as that generated
by 
pch 
Either an integer specifying a symbol or a single character to
be used as the default in plotting points. See 
cex 
A numerical value giving the amount by which plotting symbols should
be magnified relative to the 
Details
The validClimR
function is used for validation of a dendrogram tree
produced by HiClimR
, by computing detailed statistical information for
each cluster about cluster means, sizes, intra and intercluster correlations,
and overall summary. It requires the preprocessed data matrix and the tree from
HiClimR
function as inputs. An optional parameter can be used to
validate clustering for a selected number of clusters k
. If k = NULL
,
the default which supports only the regional
linkage method, objective cutting
of the tree to find the optimal number of clusters will be applied based on a user
specified significance level (/codealpha parameter). In regional
linkage method,
noisy spatial elements are isolated in very smallsize clusters or individuals since
they do not correlate well with any other elements. They can be excluded from the
validation indices (interCor
, intraCor
, diffCor
, and statSum
),
based on minSize
minimum cluster size. The excluded clusters are identified in
the output of validClimR
in clustFlag
, which takes a value of 1
for selected clusters or 0
for excluded clusters. The sum of clustFlag
elements represents the selected number clusters.This should be followed by a quality
control step before repeating the analysis.
Value
An object of class HiClimR which produces indices for validating the tree produced by the clustering process. The object is a list with the following components:
cutLevel 
the minimum significant correlation used for objective tree cut together with the corresponding confidence level. 
clustMean 
the cluster means which are the region's mean timeseries for all selected regions. 
clustSize 
cluster sizes for all selected regions. 
clustFlag 
a flag 
interCor 
intercluster correlations for all selected regions. It is the intercluster correlations between cluster means. The maximum intercluster correlation is a measure for separation or contiguity, and it is used for objective tree cut (to find the "optimal" number of clusters). 
intraCor 
intracluster correlations for all selected regions. It is the intracluster correlations between the mean of each cluster and its members. The average intracluster correlation is a weighted average for all clusters, and it is a measure for homogeneity. 
diffCor 
difference between intracluster correlation and maximum intercluster correlation for all selected regions. 
statSum 
overall statistical summary for i 
region 
ordered regions vector of size 
regionID 
ordered regions ID vector of length equals the selected number
of clusters, after excluding the small clusters defined by 
Author(s)
Hamada Badr <badr@jhu.edu>, Ben Zaitchik <zaitchik@jhu.edu>, and
Amin Dezfuli <dez@jhu.edu>. The HiClimR
is a modification of
hclust
function, which is based on Fortran code
contributed to STATLIB by F. Murtagh.
References
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 110, http://dx.doi.org/10.1007/s1214501502217.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, CRAN, http://cran.rproject.org/package=HiClimR.
See Also
HiClimR
, validClimR
, geogMask
,
fastCor
, grid2D
, and minSigCor
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31  require(HiClimR)
## Load test case data
x < TestCase$x
## Generate longitude and latitude mesh vectors
xGrid < grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat))
lon < c(xGrid$lon)
lat < c(xGrid$lat)
## Hierarchical Climate Regionalization
y < HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,
continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
standardize = TRUE, nPC = NULL, method = "regional", hybrid = FALSE,
kH = NULL, members = NULL, validClimR = TRUE, k = NULL, minSize = 1,
alpha = 0.01, plot = TRUE, colPalette = NULL, hang = 1, labels = FALSE)
## Validtion of Hierarchical Climate Regionalization
z < validClimR(y, k = NULL, minSize = 1, alpha = 0.01, plot = TRUE)
## Use a specified number of clusters (k = 12)
z < validClimR(y, k = 12, minSize = 1, alpha = 0.01, plot = TRUE)
## Apply minimum cluster size (minSize = 25)
z < validClimR(y, k = NULL, minSize = 25, alpha = 0.01, plot = TRUE)
## The optimal number of clusters, including small clusters
k < length(z$clustFlag)
## The selected number of clusters, after excluding small clusters (if minSize > 1)
ks < sum(z$clustFlag)
