recluster.region: A clustering method based on continuous consensus among...

View source: R/recluster.region.R

recluster.regionR Documentation

A clustering method based on continuous consensus among clustering solutions after resampling row order.

Description

This function is specifically designed to facilitate regionalization analysis in cases where zero and tied values are particularly frequent. This often occurs when using turnover indices at small or intermediate spatial scales where large barriers are absent. The function requires a matrix as input, with areas in rows and species occurrence (1,0) in columns. It also allows for the inclusion of a phylogenetic tree to compute phylogenetic beta-diversity.

The indices used are those supported by recluster.dist, but custom indices can also be introduced (see recluster.dist). Alternatively, a dissimilarity matrix generated by any function can be provided. The function requires input for a custom number of trees (default n=50) and a range of mincl-maxcl values (default 2-3), indicating the number of regions to be identified. Clustering methods implemented in hclust are supported, as well as Partition Around Medoids (PAM) and DIANA. The default method, ward.2D, typically offers the best performance, but ward.D, complete linkage clustering, PAM, and DIANA may also perform well.

The function generates n trees by randomly reordering the original row order. These trees are then cut at different nodes (from the mincl-1th to the maxcl-1th node), resulting in an increasing number of clusters. The function compares clustering solutions at the same cut levels across different resampled trees, producing a dissimilarity matrix between areas based on how often each pair of areas appears in different clusters across the different tree solutions at the same cut level. This dissimilarity is standardized by the number of resampled trees, yielding values from 0 (for pairs of areas always in the same cluster) to 1 (for pairs never in the same cluster).

A final hierarchical clustering is applied to generate an interval of maxcl-mincl. Since the user-defined number of clusters may not exactly match the mean number of clusters obtained from the tree cuts, the clustering solution for each k value is selected from the dissimilarity matrix closest to the mean number of clustering solutions.

Usage

recluster.region (mat,tr=50,dist="simpson",method="ward.D2", members=NULL, phylo=NULL, mincl=2,maxcl=3,
rettree=FALSE,retmat=FALSE,retmemb=FALSE)

Arguments

mat

A binary presence-absence community matrix or any dissimilarity matrix.

tr

The number of trees to be included in the consensus.

dist

One among the beta-diversity indexes allowed by recluster.dist or a custom binary dissimilarity specified according to the syntax of designdist function of the vegan package. Not required when the input is a dissimilarity matrix.

method

Any clustering method allowed by hclust but also "pam" and "diana".

members

For hclust methods, a vector.

phylo

An ultrametric and rooted phylogenetic tree for species having the same labels as in mat columns. Only required for phylogenetic beta-diversity indices.

mincl

The minimum number of regions requested

maxcl

The maximum number of regions requested

rettree

Logical, if TRUE the final trees are returned.

retmat

Logical, if TRUE the new dissimilarity matrices are returned.

retmemb

Logical, if TRUE the memberships for areas in different random trees is returned.

Details

Like other evaluators for goodness of clustering solutions, the funtion provides silhouette values and the explained dissimilarity. The explained dissimilarity (sensu Holt et al. 2013) is represented by the ratio between sums of mean dissimilarities among members of different clusters and the sum of all dissimilarities of the matrix. This value clearly tends to 1 when all areas are considered as independent groups. Silhouette width measures the strength of any partition of objects from a dissimilarity matrix by comparing the minimum distance between each cell and the most similar cell belonging to any other cluster and the mean distance between that cell and the others belonging to the same cluster (see silhouette function in the cluster package). Silhouette values range between -1 and +1, with a negative value suggesting that most cells are probably located in an incorrect cluster.

Value

memb

An array with different matrices indicating for each area (rows) the membership in each random tree (columns) in each cut (matrix).

matrices

The new dissimilarity matrices. Up-right cells provided as NAs.

nclust

Mean number of clusters among random trees obtained by different cuts.

solutions

A matrix providing number of clusters for each solution (k), the associated mean number of clusters obtained by cuts (clust), the silhouette (silh) value and the explained dissimilarity (ex.diss).

grouping

A matrix indicating cluster membership of each site in each solution for different numbers of clusters.

Author(s)

Leonardo Dapporto

References

Dapporto L. et al. A new procedure for extrapolating turnover regionalization at mid-small spatial scales, tested on British butterflies. Methods Ecol Evol (2015), 6, 1287-1297

Examples

data(dataisl)
simpson<-recluster.dist(dataisl)
turn_cl<-recluster.region(simpson,tr=10,rettree=TRUE)
#plot the three for three clusters
plot(turn_cl$tree[[2]])
#inspect cluster membership
turn_cl$grouping

leondap/recluster documentation built on Nov. 11, 2024, 7:11 a.m.