ward | R Documentation |
Produces a hierarchical clustering of one-dimensional data via Ward's method.
ward( x, n = rep(1, length(x)), s = rep(1, length(x)), sortx = TRUE, same.var = T )
x |
a numerical vector, or a list of vectors. |
n |
if x is a vector of cluster means, n is the size of each cluster. |
s |
if x is a vector of cluster means, s is the sum of squares in each
cluster. only needed if |
sortx |
if |
same.var |
if |
Repeatedly merges clusters in order to minimize the clustering cost.
By default, it is the same as hclust(method="ward")
.
If same.var=T
, the cost is the sum of squares:
sum_c sum_{i in c} (x_i - m_c)^2
The incremental cost of merging clusters ca and cb is
(n_a*n_b)/(n_a+n_b)*(m_a - m_b)^2
It prefers to merge clusters which are small and have similar means.
If same.var=F
, the cost is the sum of log-variances:
sum_c n_c*log(1/n_c*sum_{i in c} (x_i - m_c)^2)
It prefers to merge clusters which are small, have similar means, and have similar variances.
If x
is a list of vectors, each vector is assumed to be a
cluster. n
and s
are computed for each cluster and
x
is replaced by the cluster means.
Thus you can say ward(split(x,f))
to cluster the data for different
factors.
The same type of object returned by hclust
.
Because of the adjacency constraint used in implementation,
the clustering that results
from sortx=T
and same.var=F
may occasionally be suboptimal.
Tom Minka
hclust
,
plot_hclust_trace
,
hist.hclust
,
boxplot.hclust
,
break_ward
,
break.ts
,
merge_factor
x <- c(rnorm(700,-2,1.5),rnorm(300,3,0.5)) hc <- ward(x) opar <- par(mfrow=c(2,1)) # use dev.new() in RStudio plot_hclust_trace(hc) hist(hc,x) par(opar) x <- c(rnorm(700,-2,0.5),rnorm(1000,2.5,1.5),rnorm(500,7,0.1)) hc <- ward(x) opar <- par(mfrow=c(2,1)) plot_hclust_trace(hc) hist(hc,x) par(opar) data(OrchardSprays) x <- OrchardSprays$decrease f <- factor(OrchardSprays$treatment) # shuffle levels #lev <- levels(OrchardSprays$treatment) #f <- factor(OrchardSprays$treatment,levels=sample(lev)) hc <- ward(split(x,f)) # is equivalent to: #n <- tapply(x,f,length) #m <- tapply(x,f,mean) #s <- tapply(x,f,var)*n #hc <- ward(m,n,s) boxplot(hc,split(x,f))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.