hclustgeo: Hierarchical clustering with geographical contraints

Description Usage Arguments Details Value References Examples

View source: R/hclustgeo.R

Description

This function implements a Ward-like hierarchical clustering algorithm including soft contiguity constraints. This algorithm takes as input two dissimilarity matrices D0 and D1 and a mixing parameter alpha between 0 an 1. The dissimilarities can be non euclidean and the weights of the observations can be non uniform. The first matrix gives the dissimilarities in the "feature space" (socio-demographic variables or grey levels for instance). The second matrix gives the dissimilarities in the "constraint" space. For instance, D1 can be a matrix of geographical distances or a matrix build from the contiguity matrix C. The mixing parameter alpha sets the importance of the constraint in the clustering procedure.

Usage

1
hclustgeo(D0, D1 = NULL, alpha = 0, scale = TRUE, wt = NULL)

Arguments

D0

an object of class "dist" with the dissimilarities between the n observations. The function as.dist can be used to transform an object of class matrix to object of class "dist".

D1

an object of class "dist" with other dissimilarities between the same n observations.

alpha

a real value between 0 and 1. This mixing parameter gives the relative importance of D0 compared to D1. By default, this parameter is equal to 0 and D0 is used alone in the clustering process.

scale

if TRUE the two dissimilarity matrix D0 and D1 are scaled i.e. divided by their max. If D1=NULL, this parameter is no used and D0 is not scaled.

wt

vector with the weights of the observations. By default, wt=NULL corresponds to the case where all observations are weighted by 1/n.

Details

The criterion minimized at each stage is a convex combination of the homogeneity criterion calculated with D0 and the homogeneity criterion calculated with D1. The parameter alpha (the weight of this convex combination) controls the weight of the constraint in the quality of the solutions. When alpha increases, the homogeneity calculated with D0 decreases whereas the homogeneity calculated with D1 increases.

Value

Returns an object of class hclust.

References

M.chavent, V. Kuentz-Simonet, A. Labenne, J. Saracco. ClustGeo: an R package for hierarchical clustering with spatial constraints arXiv:1707.03897 [stat.CO]

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
data(estuary)
# with one dissimilarity matrix
w <- estuary$map@data$POPULATION # non uniform weights 
D <- dist(estuary$dat)
tree <- hclustgeo(D,wt=w)
sum(tree$height)
inertdiss(D,wt=w)
inert(estuary$dat,w=w)
plot(tree,labels=FALSE)
part <- cutree(tree,k=5)
sp::plot(estuary$map,border="grey",col=part)

# with two dissimilarity matrix
D0 <- dist(estuary$dat) # the socio-demographic distances
D1 <- as.dist(estuary$D.geo) # the geographical distances
alpha <- 0.2 # the mixing parameter
tree <- hclustgeo(D0,D1,alpha=alpha,wt=w)
plot(tree,labels=FALSE)
part <- cutree(tree,k=5)
sp::plot(estuary$map,border="grey",col=part)

ClustGeo documentation built on July 14, 2017, 5:01 p.m.

Search within the ClustGeo package
Search all R packages, documentation and source code