clusterSP | R Documentation |
This function is the main gateway to sarp.snowprofile::snowprofile clustering.
clusterSP(
SPx = NULL,
k = 2,
type = c("hclust", "pam", "fanny", "kdba", "fast")[1],
distmat = NULL,
config = clusterSPconfig(type),
centers = "none",
keepSPx = TRUE,
keepDistmat = TRUE
)
SPx |
a sarp.snowprofile::snowprofileSet to be clustered |
k |
number of desired cluster numbers |
type |
clustering type including |
distmat |
a precomputed distance matrix of class dist. This results in much faster clustering for |
config |
a list providing the necessary hyperparameters. Use clusterSPconfig functions for convenience! |
centers |
compute and return |
keepSPx |
append the snowprofileSet to the output? |
keepDistmat |
append the distmat to the output? |
There are several clustering approaches that can be applied to snow profiles. Most rely on computing a pairwise distance matrix between all profiles in a snowprofileSet. Current implementations with this approach rely on existing R functions:
agglomerative hierarchical clustering stats::hclust
partitioning around medoids cluster::pam
fuzzy analysis clustering cluster::fanny
Since computing a pairwise distance matrix matrix can be slow, the recommended way of testing different number of clusters $k$ is precomputing a single distance matrix with the distanceSP function and providing it as an argument to clusterSP.
An alternate type of clustering known a k-dimensional barycentric averaging kdba is conceptually similar to kmeans but specifically adapted to snow profiles clusterSPkdba. That means that an initial clustering condition (which can be random or based on a 'sophisticated guess') is iteratively refined by assigning individual profiles to the most similar cluster and at the end of every iteration recomputing the cluster centroids. The cluster centroids are represented by the average snow profile of each cluster (see averageSP). Note that the results of kdba are sensitive to the initial conditions, which by default are estimated with the 'fast' method below.
And finally, a much faster 'fast' method is available that computes a pairwise distance matrix without aligning profiles, but instead based on summary statistics such as snow height, height of new snow, presence or absence of weak layers and crusts, etc. The 'fast' clustering approach uses the partitioning around medoids clustering approach with the 'fast' distance matrix.
More details here...
a list of class clusterSP
containing:
clustering
: vector of integers (from 1:k) indicating the cluster to which each point is allocated
id.med
: vector of indices for the medoid profiles of each cluster (if calculated)
centroids
: snowprofileSet containing the centroid profile for each cluster (if calculated)
tree
: object of class 'hclust' describing the tree output by hclust
...
: all other outputs provided by the clustering algorithms (e.g., a membership matrix
from fanny.object
, pam.object
, iteration history from clusterSPkdba)
type
: type of clustering as provided by input argument
call
: a copy of the clusterSP function call
SPx
: a copy of the input snowprofileSet (if keepSPx = TRUE
)
distmat
: the pairwise distance matrix of class dist (if keepDistmat = TRUE
and a matrix has been provided or computed)
fherla shorton
clusterSPconfig, clusterSPcenters, clusterSPkdba, plot.clusterSP
this_example_runs_too_long <- TRUE
if (!this_example_runs_too_long) { # exclude from cran checks
## Cluster with SPgroup2, which contains deposition date and p_unstable
SPx <- SPgroup2
config <- clusterSPconfig(simType = 'wsum_scaled', ddate = T, pwls = T)
## Hierarchical clustering with k = 2
cl_hclust <- clusterSP(SPx, k = 2, type = 'hclust', config = config)
plot(cl_hclust)
## Precompute a distance matrix and cluster with PAM for k = 2 and 3
distmat <- do.call('distanceSP', c(list(SPx), config$args_distance))
cl_pam2 <- clusterSP(SPx, k = 2, type = 'pam', config = config, distmat = distmat)
cl_pam3 <- clusterSP(SPx, k = 3, type = 'pam', config = config, distmat = distmat)
print(cl_pam2$clustering)
print(cl_pam3$clustering)
## kdba clustering
config_kdba <- clusterSPconfig(simType = 'layerwise', type = 'kdba')
cl_kdba <- clusterSP(SPx = SPgroup2, k = 2, type = 'kdba', config = config_kdba)
plot(cl_kdba)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.