partition_metrics: Calculate metrics for one or several partitions
In bioRgeo/bioRgeo: Bioregionalisation Methods in R

partition_metrics

R Documentation

Calculate metrics for one or several partitions

Description

This function aims at calculating metrics for one or several partitions, usually on outputs from netclu_, hclu_ or nhclu_ functions. Metrics may require the users to provide either a similarity or dissimilarity matrix, or to provide the initial species-site table.

Usage

partition_metrics(
  cluster_object,
  dissimilarity = NULL,
  dissimilarity_index = names(dissimilarity)[3],
  net = NULL,
  site_col = 1,
  species_col = 2,
  eval_metric = c("pc_distance", "anosim", "avg_endemism", "tot_endemism")
)

Arguments

`cluster_object`	tree a `bioRgeo.hierar.tree` or a `hclust` object
`dissimilarity`	a `dist` object or a `bioRgeo.pairwise.metric` object (output from `similarity_to_dissimilarity()`). Necessary if `eval_metric` includes `pc_distance` and `tree` is not a `bioRgeo.hierar.tree` object
`dissimilarity_index`	a character string indicating the dissimilarity (beta-diversity) index to be used in case `dist` is a `data.frame` with multiple dissimilarity indices
`net`	the species-site network (i.e., bipartite network). Should be provided if `eval_metric` includes `"avg_endemism"` or `"tot_endemism"`
`site_col`	name or number for the column of site nodes (i.e. primary nodes). Should be provided if `eval_metric` includes `"avg_endemism"` or `"tot_endemism"`
`species_col`	name or number for the column of species nodes (i.e. feature nodes). Should be provided if `eval_metric` includes `"avg_endemism"` or `"tot_endemism"`
`eval_metric`	character string or vector of character strings indicating metric(s) to be calculated to investigate the effect of different number of clusters. Available options: `"pc_distance"`, `"anosim"`, `"avg_endemism"` and `"tot_endemism"`

Details

\loadmathjax

Evaluation metrics:

pc_distance: this metric is the method used by \insertCiteHolt2013bioRgeo. It is a ratio of the between-cluster sum of dissimilarity (beta-diversity) versus the total sum of dissimilarity (beta-diversity) for the full dissimilarity matrix. In other words, it is calculated on the basis of two elements. First, the total sum of dissimilarity is calculated by summing the entire dissimilarity matrix (dist). Second, the between-cluster sum of dissimilarity is calculated as follows: for a given number of cluster, the dissimilarity is only summed between clusters, not within clusters. To do that efficiently, all pairs of sites within the same clusters have their dissimilarity set to zero in the dissimilarity matrix, and then the dissimilarity matrix is summed. The pc_distance ratio is obtained by dividing the between-cluster sum of dissimilarity by the total sum of dissimilarity.
anosim: This metric is the statistic used in Analysis of Similarities, as suggested in \insertCiteCastro-Insua2018bioRgeo (see vegan::anosim()). It compares the between-cluster dissimilarities to the within-cluster dissimilarities. It is based based on the difference of mean ranks between groups and within groups with the following formula: \mjeqnR = (r_B - r_W)/(N (N-1) / 4)R = (r_B - r_W)/(N (N-1) / 4), where \mjeqnr_Br_B and \mjeqnr_Wr_W are the average ranks between and within clusters respectively, and \mjeqnNN is the total number of sites. Note that the function does not estimate the significance here, it only computes the statistic - for significance testing see vegan::anosim().
avg_endemism: this metric is the average percentage of endemism in clusters as recommended by \insertCiteKreft2010bioRgeo. Calculated as follows: \mjeqnEnd_mean = \frac\sum_i=1^K E_i / S_iKPc_endemism_mean = sum(Ei / Si) / K where \mjeqnE_iEi is the number of endemic species in cluster i, \mjeqnS_iSi is the number of species in cluster i, and K the maximum number of clusters.
tot_endemism: this metric is the total endemism across all clusters, as recommended by \insertCiteKreft2010bioRgeo. Calculated as follows: \mjeqnEnd_tot = \fracECEndemism_total = E/C

where \mjeqnEE is total the number of endemics (i.e., species found in only one cluster) and \mjeqnCC is the number of non-endemic species.

Value

a list of class bioRgeo.partition.metrics with two elements:

args: input arguments
evaluation_df: the data.frame containing eval_metric for all explored numbers of clusters

Author(s)

Boris Leroy (leroy.boris@gmail.com), Maxime Lenormand (maxime.lenormand@inrae.fr) and Pierre Denelle (pierre.denelle@gmail.com)

References

\insertRef

Castro-Insua2018bioRgeo

\insertRef

Ficetola2017bioRgeo

\insertRef

Holt2013bioRgeo

\insertRef

Kreft2010bioRgeo

\insertRef

Langfelder2008bioRgeo

Examples

## Not run: 
dissim <- dissimilarity(fishmat, metric = "all")

# User-defined number of clusters
tree1 <- hclu_hierarclust(dissim, n_clust = 2:20, index = "Simpson")
tree1

a <- partition_metrics(tree1, dissimilarity = dissim, net = fishdf,
                  site_col = "Site", species_col = "Species",
                  eval_metric = c("tot_endemism", "avg_endemism",
                                  "pc_distance", "anosim"))

## End(Not run)

bioRgeo/bioRgeo documentation built on March 10, 2023, 9:48 p.m.