find_clusters: Detection of clusters in 2D spaces
In biosurvey: Tools for Biological Survey Planning

Description Usage Arguments Details Value Examples

Finds clusters of data in two dimensions based on distinct methods.

1
2
3

find_clusters(data, x_column, y_column, space,
              cluster_method = "hierarchical", n_k_means = NULL,
              split_distance = NULL)

`data`	matrix or data.frame that contains at least two columns.
`x_column`	(character) the name of the x-axis.
`y_column`	(character) the name of the y-axis.
`space`	(character) space in which the thinning will be performed. There are two options available: "G", if it will be in the geographic space, and "E", if it will be in the environmental space.
`cluster_method`	(character) name of the method to be used for detecting clusters. Options are "hierarchical" and "k-means"; default = "hierarchical".
`n_k_means`	(numeric) number of clusters to be identified when using the "k-means" in `cluster_method`.
`split_distance`	(numeric) distance in meters (if `space` = "G") or Euclidean distance (if `space` = "E") to identify clusters if `cluster_method` = "hierarchical".

Clustering methods make distinct assumptions and one of them may perform better than the other depending on the pattern of the data.

The k-means method tends to perform better when data are grouped spatially (spherically) and clusters are of a similar size. The hierarchical clustering algorithm usually takes more time than the k-means method. Both methods make assumptions and may work well on some data sets but fail on others.

A data frame containing data and an additional column defining clusters.

# Data
data("m_matrix", package = "biosurvey")

# Cluster detection
clusters <-  find_clusters(m_matrix$data_matrix, x_column = "PC1",
                           y_column = "PC2", space = "E",
                           cluster_method = "hierarchical", n_k_means = NULL,
                           split_distance = 4)
head(clusters)