View source: R/find_optimal_n.R
find_optimal_n | R Documentation |
This function aims to optimize one or several criteria on a set of ordered bioregionalizations. It is typically used to find one or more optimal cluster counts on hierarchical trees to cut or ranges of bioregionalizations from k-means or PAM. Users should exercise caution in other cases (e.g., unordered bioregionalizations or unrelated bioregionalizations).
find_optimal_n(
bioregionalizations,
metrics_to_use = "all",
criterion = "elbow",
step_quantile = 0.99,
step_levels = NULL,
step_round_above = TRUE,
metric_cutoffs = c(0.5, 0.75, 0.9, 0.95, 0.99, 0.999),
n_breakpoints = 1,
plot = TRUE
)
bioregionalizations |
A |
metrics_to_use |
A |
criterion |
A |
step_quantile |
For |
step_levels |
For |
step_round_above |
A |
metric_cutoffs |
For |
n_breakpoints |
Specifies the number of breakpoints to find in the curve. Defaults to 1. |
plot |
A |
This function explores evaluation metric ~ cluster relationships, applying criteria to find optimal cluster counts.
Note on criteria: Several criteria can return multiple optimal cluster counts, emphasizing hierarchical or nested bioregionalizations. This approach aligns with modern recommendations for biological datasets, as seen in Ficetola et al. (2017)'s reanalysis of Holt et al. (2013).
Criteria for optimal clusters:
elbow
: Identifies the "elbow" point in the evaluation metric curve,
where incremental improvements diminish. Based on a method to find the
maximum distance from a straight line linking curve endpoints.
increasing_step
or decreasing_step
: Highlights significant
increases or decreases in metrics by analyzing pairwise differences between
bioregionalizations. Users specify step_quantile
or step_levels
.
cutoffs
: Derives clusters from specified metric cutoffs, e.g., as in
Holt et al. (2013). Adjust cutoffs based on spatial scale.
breakpoints
: Uses segmented regression to find breakpoints. Requires
specifying n_breakpoints
.
min
& max
: Selects clusters at minimum or maximum metric values.
A list
of class bioregion.optimal.n
with these elements:
args
: Input arguments.
evaluation_df
: The input evaluation data.frame
, appended with
boolean
columns for optimal cluster counts.
optimal_nb_clusters
: A list
with optimal cluster counts for each
metric in "metrics_to_use"
, based on the chosen criterion
.
plot
: The plot (if requested).
Please note that finding the optimal number of clusters is a procedure which normally requires decisions from the users, and as such can hardly be fully automatized. Users are strongly advised to read the references indicated below to look for guidance on how to choose their optimal number(s) of clusters. Consider the "optimal" numbers of clusters returned by this function as first approximation of the best numbers for your bioregionalization.
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Holt BG, Lessard J, Borregaard MK, Fritz SA, Araújo MB, Dimitrov D, Fabre P, Graham CH, Graves GR, Jønsson Ka, Nogués-Bravo D, Wang Z, Whittaker RJ, Fjeldså J & Rahbek C (2013) An update of Wallace's zoogeographic regions of the world. Science 339, 74-78.
Ficetola GF, Mazel F & Thuiller W (2017) Global determinants of zoogeographical boundaries. Nature Ecology & Evolution 1, 0089.
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html#optimaln.
Associated functions: hclu_hierarclust
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
# User-defined number of clusters
tree <- hclu_hierarclust(dissim,
optimal_tree_method = "best",
n_clust = 5:10)
tree
a <- bioregionalization_metrics(tree,
dissimilarity = dissim,
species_col = "Node2",
site_col = "Node1",
eval_metric = "anosim")
find_optimal_n(a, criterion = 'increasing_step', plot = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.