select_parameters.mc: Select parameters for clustering algorithm (multicore)

View source: R/side_functions.R

select_parameters.mcR Documentation

Select parameters for clustering algorithm (multicore)

Description

Function to select the parameters for a clustering algorithm. This version of the function allows to use a plan defined with the package future to reduce calculation time.

Usage

select_parameters.mc(
  algo,
  data,
  k,
  m,
  alpha = NA,
  beta = NA,
  nblistw = NULL,
  lag_method = "mean",
  window = NULL,
  spconsist = TRUE,
  classidx = TRUE,
  nrep = 30,
  indices = NULL,
  standardize = TRUE,
  robust = FALSE,
  noise_cluster = FALSE,
  delta = NA,
  maxiter = 500,
  tol = 0.01,
  chunk_size = 5,
  seed = NULL,
  init = "random",
  verbose = TRUE
)

selectParameters.mc(
  algo,
  data,
  k,
  m,
  alpha = NA,
  beta = NA,
  nblistw = NULL,
  lag_method = "mean",
  window = NULL,
  spconsist = TRUE,
  classidx = TRUE,
  nrep = 30,
  indices = NULL,
  standardize = TRUE,
  robust = FALSE,
  noise_cluster = FALSE,
  delta = NA,
  maxiter = 500,
  tol = 0.01,
  chunk_size = 5,
  seed = NULL,
  init = "random",
  verbose = TRUE
)

Arguments

algo

A string indicating which method to use (FCM, GFCM, SFCM, SGFCM)

data

A dataframe with numeric columns

k

A sequence of values for k to test (>=2)

m

A sequence of values for m to test

alpha

A sequence of values for alpha to test (NULL if not required)

beta

A sequence of values for beta to test (NULL if not required)

nblistw

A list of list.w objects describing the neighbours typically produced by the spdep package (NULL if not required)

lag_method

A string indicating if a classical lag must be used ("mean") or if a weighted median must be used ("median"). Both can be tested by specifying a vector : c("mean","median"). When working with rasters, the string must be parsable to a function like mean, min, max, sum, etc. and will be applied to all the pixels values in the window designated by the parameter window and weighted according to the values of this matrix.

window

A list of windows to use to calculate neighbouring values if rasters are used.

spconsist

A boolean indicating if the spatial consistency must be calculated

classidx

A boolean indicating if the quality of classification indices must be calculated

nrep

An integer indicating the number of permutation to do to simulate the random distribution of the spatial inconsistency. Only used if spconsist is TRUE.

indices

A character vector with the names of the indices to calculate, to evaluate clustering quality. default is :c("Silhouette.index", "Partition.entropy", "Partition.coeff", "XieBeni.index", "FukuyamaSugeno.index", "Explained.inertia"). Other available indices are : "DaviesBoulin.index", "CalinskiHarabasz.index", "GD43.index", "GD53.index" and "Negentropy.index".

standardize

A boolean to specify if the variable must be centered and reduce (default = True)

robust

A boolean indicating if the "robust" version of the algorithm must be used (see details)

noise_cluster

A boolean indicatong if a noise cluster must be added to the solution (see details)

delta

A float giving the distance of the noise cluster to each observation

maxiter

An integer for the maximum number of iteration

tol

The tolerance criterion used in the evaluateMatrices function for convergence assessment

chunk_size

The size of a chunk used for multiprocessing. Default is 100.

seed

An integer used for random number generation. It ensures that the start centers will be the same if the same integer is selected.

init

A string indicating how the initial centers must be selected. "random" indicates that random observations are used as centers. "kpp" use a distance based method resulting in more dispersed centers at the beginning. Both of them are heuristic.

verbose

A boolean indicating if a progressbar should be displayed

Value

A dataframe with indicators assessing the quality of classifications

Examples


data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
future::plan(future::multisession(workers=2))
#set spconsist to TRUE to calculate the spatial consistency indicator
#FALSE here to reduce the time during package check
values <- select_parameters.mc("SFCM", dataset, k = 5, m = seq(1,2.5,0.1),
    alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE)
## make sure any open connections are closed afterward
if (!inherits(future::plan(), "sequential")) future::plan(future::sequential)


data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
future::plan(future::multisession(workers=2))
#set spconsist to TRUE to calculate the spatial consistency indicator
#FALSE here to reduce the time during package check
values <- select_parameters.mc("SFCM", dataset, k = 5, m = seq(1,2.5,0.1),
    alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE)



geocmeans documentation built on Sept. 12, 2023, 9:06 a.m.