boot_group_validation: Check the robustness of a classification by Bootstrap

View source: R/boostrap_clust_validation.R

boot_group_validationR Documentation

Check the robustness of a classification by Bootstrap


Check that the obtained groups are stable by bootstrap


  nsim = 1000,
  maxiter = 1000,
  tol = 0.01,
  init = "random",
  verbose = TRUE,
  seed = NULL



A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans


The number of replications to do for the bootstrap evaluation


An integer for the maximum number of iterations


The tolerance criterion used in the evaluateMatrices function for convergence assessment


A string indicating how the initial centres must be selected. "random" indicates that random observations are used as centres "kpp" use a distance-based method resulting in more dispersed centres at the beginning. Both of them are heuristic.


A boolean to specify if the progress bar should be displayed.


An integer used for random number generation. It ensures that the starting centres will be the same if the same value is selected.


Considering that the classification produced by a FCM like algorithm depends on its initial state, it is important to check if the groups obtained are stable. This function uses a bootstrap method to do so. During a selected number of iterations (at least 1000), a sample of size n (with replacement) is drawn from the original dataset. For each sample, the same classification algorithm is applied and the results are compared with the reference results. For each original group, the most similar group is identified by calculating the Jaccard similarity index between the columns of the two membership matrices. This index is comprised between 0 (exact difference) and 1 (perfect similarity) and a value is calculated for each group at each iteration. One can investigate the values obtained to determine if the groups are stable. Values under 0.5 are a concern and indicate that the group is dissolving. Values between 0.6 and 0.75 indicate a pattern in the data, but a significant uncertainty. Values above 0.8 indicate strong groups. The values of the centres obtained at each iteration are also returned, it is important to ensure that they approximately follow a normal distribution (or are at least unimodal).


A list of two values: group_consistency: a dataframe indicating the consistency across simulations each cluster ; group_centres: a list with a dataframe for each cluster. The values in the dataframes are the centres of the clusters at each simulation.


## Not run: 

#selecting the columns for the analysis
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14",

#rescaling the columns
Data <- sf::st_drop_geometry(LyonIris[AnalysisFields])
for (Col in names(Data)){
  Data[[Col]] <- as.numeric(scale(Data[[Col]]))

Cmean <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 456,
    tol = 0.00001, verbose = FALSE)

validation <- boot_group_validation(Cmean, nsim = 1000, maxiter = 1000,
    tol = 0.01, init = "random")

## End(Not run)

geocmeans documentation built on Oct. 16, 2022, 1:07 a.m.