piv_sel | R Documentation |
Finding pivotal units from a data partition and a co-association matrix C according to three different methods.
piv_sel(C, clusters)
C |
A N \times N co-association matrix, i.e. a matrix whose elements are co-occurrences of pair of units in the same cluster among H distinct partitions. |
clusters |
A vector of integers from 1:k indicating a partition of the N units into, say, k groups. |
Given a set of N observations (y_{1},y_{2},...,y_{N})
(y_i may be a d-dimensional vector, d ≥ 1),
consider clustering methods to obtain H distinct partitions
into k groups.
The matrix C
is the co-association matrix,
where c_{i,p}=n_{i,p}/H, with n_{i,p} the number of times
the pair (y_{i},y_{p}) is assigned to the same
cluster among the H partitions.
Let j be the group containing units \mathcal J_j, the user may choose {i^*}\in\mathcal J_j that maximizes one of the quantities:
∑_{p\in\mathcal J_j} c_{{i^*}p}
or
∑_{p\in\mathcal J_j} c_{{i^*}p} - ∑_{j\not\in\mathcal J_j} c_{{i^*}p}.
These methods give the unit that maximizes the global
within similarity ("maxsumint"
) and the unit that
maximizes the difference between global within and
between similarities ("maxsumdiff"
), respectively.
Alternatively, we may choose i^{*} \in\mathcal J_j, which minimizes:
∑_{p\not\in\mathcal J_j} c_{i^{*}p},
obtaining the most distant unit among the members
that minimize the global dissimilarity between one group
and all the others ("minsumnoint"
).
See the vignette for further details.
|
A matrix with k rows and three columns containing the indexes of the pivotal units for each method. |
Leonardo Egidi legidi@units.it
Egidi, L., Pappadà, R., Pauli, F. and Torelli, N. (2018). Relabelling in Bayesian Mixture Models by Pivotal Units. Statistics and Computing, 28(4), 957-969.
# Iris data data(iris) # select the columns of variables x<- iris[,1:4] N <- nrow(x) H <- 1000 a <- matrix(NA, H, N) # Perform H k-means partitions for (h in 1:H){ a[h,] <- kmeans(x, centers = 3)$cluster } # Build the co-association matrix C <- matrix(NA, N,N) for (i in 1:(N-1)){ for (j in (i+1):N){ C[i,j] <- sum(a[,i]==a[,j])/H C[j,i] <- C[i,j] }} km <- kmeans(x, centers =3) # Apply three pivotal criteria to the co-association matrix ris <- piv_sel(C, clusters = km$cluster) graphics::plot(iris[,1], iris[,2], xlab ="Sepal.Length", ylab= "Sepal.Width", col = km$cluster) # Add the pivots chosen by the maxsumdiff criterion points( x[ris$pivots[,3], 1:2], col = 1:3, cex =2, pch = 8 )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.