som_gap | R Documentation |
Produces a low-dimensional representation of the input feature space for
subsequent estimation of the "optimal" number of clusters (k) in a
multivariate dataset. The dimension reduction is based on the
self-organizing map technique (SOM) of Kohonen (1982; 1990), and implemented
in R by the function supersom
of Wehrens and
Kruisselbrink (2018). To estimate the optimal k, the partitioning
around medoids (PAM) of Kaufman and Rousseeuw (1990), coupled with the gap
statistic of Tibshirani et al. (2001), is performed on the SOM's codebook
vectors. This is achieved by internally calling pam
and clusGap
(Maechler et al., 2021). See
Details for a brief theoretical background.
som_gap(
var.rast,
xdim = 12,
ydim = 12,
topo = "hexagonal",
neighbourhood.fct = "gaussian",
rlen = 600,
dist.fcts = c("sumofsquares", "manhattan"),
mode = "pbatch",
K.max,
stand = FALSE,
B = 500,
d.power = 2,
spaceH0 = "original",
method = "globalSEmax",
SE.factor = 1,
...
)
var.rast |
SpatRaster, as in |
xdim |
Integer. Horizontal dimension of the SOM's grid. Default: 12 |
ydim |
Integer. Vertical dimension of the SOM's grid. Default: 12 |
topo |
Character. Topology of the SOM's grid. Options = "rectangular", "hexagonal". Default: "hexagonal" |
neighbourhood.fct |
Character. Neighborhood of the SOM's grid. Options = "bubble", "gaussian". Default: "gaussian" |
rlen |
Integer. Number of times the complete dataset will be presented to the SOM's network. Default: 600 |
dist.fcts |
Character. Vector of length 2 containing the distance functions to use for SOM (First element, options = "sumofsquares", "euclidean", "manhattan") and for PAM (second element, options = "euclidean", "manhattan"). Default: c("sumofsquares", "manhattan") |
mode |
Character. Type of learning algorithm. Options are “online", "batch", and "pbatch". Default: "pbatch" |
K.max |
Integer. Maximum number of clusters to consider, must be at least two (2). |
stand |
Boolean. For PAM function, does SOM's codebook vectors need to be standardized? Default: FALSE |
B |
Integer. Number of bootstrap samples for the gap statistic. Default: 500 |
d.power |
Integer. Positive Power applied to euclidean distances for the gap statistic. Default: 2 |
spaceH0 |
Character. Space of the reference distribution for the gap statistic. Options = "scaledPCA", "original" (See Details). Default: "original" |
method |
Character. Optimal k selection criterion for the gap statistic.
Options = "globalmax", "firstmax", "Tibs2001SEmax", "firstSEmax",
"globalSEmax". See |
SE.factor |
Numeric. Factor to feed into the standard error rule for the
gap statistic. Only applicable for methods based on standard error (SE).
See |
... |
Additional arguments as for |
The clustering of SOM's codebook vectors has been proposed in several works, notably in that from Vesanto and Alhoniemi (2000). These authors proposed a two-stage clustering routine as an efficient method to reduce computational load, while obtaining satisfactory correspondence between the clustered codebook vectors and the clustered original feature space.
The main purpose of this function is to allow the use of clustering and k-selection algorithms that may result prohibitive for large datasets, such as matrices derived from raster layers commonly used during geocomputational routines. Thus, the SOM's codebook vectors can be subsequently used for the calculation of distance matrices, which given the large size of their input feature space, may otherwise be impossible to create due to insufficient memory allocation capacity. Similarly, robust clustering algorithms that require full pairwise distance matrices (e.g., hierarchical clustering, PAM) and/or eigenvalues (e.g., spectral clustering) may also be performed on SOM's codebook vectors.
Note that supersom
will internally equalize the
importance (i.e., weights) of variables such that differences in scale will
not affect distance calculations. This behavior can be prevented by setting
normalizeDataLayers = FALSE in additional arguments passed to
supersom
. Moreover, custom weights can also be passed
through the additional argument user.weights. In such case, user
weights are applied on top of the internal weights.
When working with large matrices, the additional SOM argument
keep.data may be set to FALSE. However, note that by doing so, the
suggested follow-up function for raster products som_pam
will
not work since it requires both original data and winning units.
For the gap statistic, method = "scaledPCA" has resulted in errors for R sessions with BLAS/LAPACK supported by the Intel Math Kernel Library (MKL).
SOM: An object of class kohonen (see
supersom
). The components of class kohonen returned by
this function are: (1) data = original input matrix, (2)
unit.classif = winning units for all observations, (3)
distances = distance between each observation and its corresponding
winning unit, (4) grid = object of class somgrid (see
somgrid
), (5) codes = matrix of codebook
vectors, (6) changes = matrix of mean average deviations from codebook
vectors, (7) dist.fcts = selected distance function, and other
arguments passed to supersom
(e.g., radius,
distance.weights, etc.). Note that components 1, 2, and 3 will only be
returned if keep.data = TRUE, which is the default.
SOMdist: Object of class dist. Matrix of pairwise distances calculated from the SOM's codebook vectors.
SOMgap: Object of class clusGap. The main component of
class clusGap returned by this function is Tab, which is a matrix of
the gap statistic results (see clusGap
). Additional
components are the arguments passed to the function (i.e., spaceH0,
B), the PAM function, n (number of observations) and
call (the clusGap call-type object).
Kopt: Optimal k, as selected by arguments method and (possibly) SE.factor.
L. Kaufman and P. Rousseeuw. Finding groups in data: an introduction to cluster analysis. John Wiley & Sons, 1990. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1002/9780470316801")}
T. Kohonen. Self-organized formation of topologically correct feature maps. Biological cybernetics, 43 (1):59–69, 1982. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1007/bf00337288")}
T. Kohonen. The self-organizing map. Proceedings of the IEEE, 78(9):1464–1480, 1990. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1016/s0925-2312(98)00030-7")}
M. Maechler, P. Rousseeuw, A. Struyf, M. Hubert, and K. Hornik. cluster: Cluster Analysis Basics and Extensions, 2021. https://CRAN.R-project.org/package=cluster
R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2):411–423, 2001. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1111/1467-9868.00293")}
J. Vesanto and E. Alhoniemi. Clustering of the self-organizing map. IEEE Transactions on Neural Networks, 11(3):586–600, 2000. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1109/72.846731")}
R. Wehrens and J. Kruisselbrink. Flexible self-organizing maps in kohonen 3.0. Journal of Statistical Software, 87(1):1–18, 2018. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.18637/jss.v087.i07")}
Other Functions for Landscape Stratification:
som_pam()
,
strata()
require(terra)
# Multi-layer SpatRaster with topographic variables
p <- system.file("exdat", package = "rassta")
tf <- list.files(path = p, pattern = "^height|^slope|^wetness",
full.names = TRUE
)
t <- rast(tf)
# Scale topographic variables (mean = 0, StDev = 1)
ts <- scale(t)
# Self-organizing map and gap statistic for optimum k
set.seed(963)
tsom <- som_gap(var.rast = ts, xdim = 8, ydim = 8, rlen = 150,
mode = "online", K.max = 6, B = 300, spaceH0 = "original",
method = "globalSEmax"
)
# Optimum k
tsom$Kopt
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.