featClust | R Documentation |
The function provides the facility to cluster the features of the input dataset on the basis of
either their (projected) coordinates (for points; SpatialPointsDataFrame class)
or of their area (for polygons; SpatialPolygonsDataFrame class). If a target feature dataset
(to.feat) is provided, the clustering will be based
on the distance of the x feature to the nearest to.feature. When a to.feature is specified,
the x feature (i.e., the feature that the user wants to cluster)
can be either a point (SpatialPointsDataFrame class), or a polyline (SpatialLinesDataFrame class)
, or a polygon (SpatialPolygonsDataFrame class) feature.
Notice that if all the x features overlap with all the to.feature, all the minimum distances will
be 0, and the function will trow an error.
featClust( x, to.feat = NULL, aggl.meth = "ward.D2", part = NULL, showID = TRUE, oneplot = TRUE, cex.dndr.lab = 0.85, cex.sil.lab = 0.75, cex.feat.lab = 0.65, col.feat.lab = "black", export = FALSE )
x |
Dataset whose feature are to be clustered; either points (SpatialPointsDataFrame class) or polygons (SpatialPolygonsDataFrame class); if the to.feat is specified, x can also be a polylines feature (SpatialLinesDataFrame class). |
to.feat |
Dataset (NULL by default) representing the feature the distance toward which is used as basis for clustering x; either points (SpatialPointsDataFrame class), polygons (SpatialPolygonsDataFrame class), or polylines (SpatialLinesDataFrame). |
aggl.meth |
Agglomeration method ("ward.D2" by default). |
part |
Desired number of clusters; if NULL (default), an optimal partition is calculated (see Details). |
showID |
TRUE (default) or FALSE if the used wants or does not want the ID of the clustered features to be displayed in the plot where the features are colored by cluster membership. |
oneplot |
TRUE (default) or FALSE if the user wants or does not want the plots to be visualized in a single window. |
cex.dndr.lab |
Set the size of the labels used in the dendrogram. |
cex.sil.lab |
Set the size of the labels used in the silhouette plot. |
cex.feat.lab |
Set the size of the labels used (if 'showID' is set to TRUE) to show the clustered features' IDs. |
col.feat.lab |
Set the color of the clustered features' IDs ('black' by default). |
export |
TRUE or FALSE (default) if the user wants or does not want the clustered input dataset to be exported; if TRUE, the input dataset with a new variable indicating the cluster membership will be exported as a shapefile. |
If the to.feature is not provided, the function internally calculates a distance matrix
(based on the Euclidean Distance) on the basis of the points' coordinates or polygons' area.
If the to.feature is provided, the distance matrix will be based on the distance of the x feature
to the nearest to.feature.
A dendrogram is produced which depicts the hierarchical clustering based (by default) on the
Ward's agglomeration method; rectangles identify the selected cluster partition.
Besides the dendrogram, a silhouette plot is produced, which allows to measure how 'good' is the
selected cluster solution.
As for the latter, if the parameter 'part' is left empty (default), an optimal cluster solution
is obtained.
The optimal partition is selected via an iterative procedure which locates at which cluster
solution the highest average silhouette width is achieved.
If a user-defined partition is needed, the user can input the desired number of clusters using
the parameter 'part'.
In either case, an additional plot is returned besides the cluster dendrogram and the silhouette
plot; it displays a scatterplot in which the cluster solution (x-axis)
is plotted against the average silhouette width (y-axis). A black dot represent the partition
selected either by the iterative procedure or by the user.
Notice that in the silhouette plot, the labels on the left-hand side of the chart show the point
ID number and the cluster to which each point is closer.
Also, the function returns a plot showing the input dataset, with features colored by cluster
membership. Two new variables are added to the
shapefile's dataframe, storing a point ID number and the corresponding cluster membership.
The silhouette plot is obtained from the 'silhouette()' function out from the 'cluster' package
(https://cran.r-project.org/web/packages/cluster/index.html).
For a detailed description of the silhouette plot, its rationale, and its interpretation, see:
Rousseeuw P J. 1987. "Silhouettes: A graphical aid to the interpretation and validation of
cluster analysis", Journal of Computational and Applied Mathematics 20, 53-65
(http://www.sciencedirect.com/science/article/pii/0377042787901257)
For the hierarchical clustering of features, see: Conolly, J., & Lake, M. (2006).
Geographic Information Systems in Archaeology. Cambridge: Cambridge University Press, 168-173.
The function returns a list storing the following components
$dist.matrix: distance matrix
$avr.silh.width.by.n.of.clusters: average silhouette width by number of clusters
$partition.silh.data: silhouette data for the selected partition
$coord.or.area.or.min.dist.by.clust: coordinates, area, or distance to the nearest to.feat coupled with cluster membership
$dist.stats.by.cluster: by-cluster summary statistics of the x feature distance to the nearest to.feature
$dataset: the input dataset with two variables added ($feat_ID and $clust, the latter storing the cluster membership)
data(springs) #perform the analysis and automatically select an optimal partition res <- featClust(springs) #as above, but selecting a 3-cluster partition res <- featClust(springs, part=3) #cluster springs on the basis of their distance to the nearest geological fault res <- featClust(springs, faults) #cluster polygonal areas on the basis of their distance to the nearest spring res <- featClust(polygons, springs) #cluster points on the basis of their distance to the nearest polygon res <- featClust(points, polygons)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.