cluster_split: Split data based on clusters

View source: R/helpers.R

cluster_splitR Documentation

Split data based on clusters

Description

Split data based on clusters

Usage

cluster_split(
  data,
  cluster_method = "hierarchical",
  split_distance = NULL,
  n_kmeans = NULL
)

Arguments

data

data.frame of occurrence records containing at least longitude and latitude columns.

cluster_method

(character) name of the method to be used for clustering the occurrences. Options are "hierarchical" and "k-means"; default = "hierarchical".

split_distance

(numeric) distance in km that will be considered as the limit of connectivity among polygons created with clusters of occurrences. This parameter is used when cluster_method = "hierarchical". Default = NULL.

n_kmeans

(numeric) if split = TRUE, number of clusters in which the species occurrences will be grouped when using the "k-means" cluster_method. Default = NULL.

Details

The cluster_method must be chosen based on the spatial configuration of the species occurrences. Both methods make distinct assumptions and one of them may perform better than the other depending on the spatial pattern of the data.

The k-means method, for example, perfomrs better when the following assumptions are fulfilled: Clusters are spatially grouped—or “spherical” and Clusters are of a similar size. Owing to the nature of the hierarchical clustering algorithm it may take more time than the k-means method. Both methods make assumptions and they may work well on some data sets, and fail on others.

Another important factor to consider is that the k-means method allways starts with a random choice of cluster centers, thus it may end in different results on different runs. That may be problematic when trying to replicate your methods. With hierarchical clustering, most likely the same clusters can be obtained if the process is repeated.

For more information on these clustering methods see Aggarwal and Reddy (2014) https://goo.gl/RQ2ebd.


marlonecobos/ellipsenm documentation built on Oct. 18, 2023, 8:09 a.m.