parApplyClusterAnalysis: Apply Operations using Workes for Cluster Analysis

View source: R/parApplyClusterAnalysis.R

parApplyClusterAnalysisR Documentation

Apply Operations using Workes for Cluster Analysis

Description

The functions provides several ways to parallelize clustering algorithms using a collection of workers. It was used for the benchmarking of clustering algorithms in [Thrun, 2018] and [Thrun/Ultsch, 2020].

Usage

parApplyClusterAnalysis(DataOrDistances,FUN,

NumberOfTrials=1:100,ClusterNo=NULL,

WorkersOrNo,SocketType="PSOCK",

SetSeed=TRUE,...)

Arguments

DataOrDistances

Option 1: Either [1:N,1:d] matrix of data (N cases, d dimensions) that will be used. One DataPoint per row

or symmetric distance matrix [1:N,1:N] depending on FUN.

Option2: A list of data or distance

FUN

Function of clustering algorithms.

NumberOfTrials

Number of trials to be performed with FUN

ClusterNo

Number of k clusters, if required in FUN, if Option2, then a vector of number of k clusters in the same order as the list of DataOrDistance

WorkersOrNo

Either already initialized workers using makeCluster or number of workers (number of cores used). If not set, number is estimated.

SocketType

see makeCluster for details, if default does not work.

SetSeed

TRUE: set.seed is set to 1000+ (Number Of Trial), ComputationTime is named with seed FALSE: set.seed is set to NULL), ComputationTime is named with number of trial

...

Further arguments required in FUN.

Details

Default is the number of cores existing minus 1.

In FCPS default parameters for each clustering algorithm are used automatically if not specified by the user. parApplyClusterAnalysis expects in FUN a function of a clustering algorithm which returns a list of objects of which one is named Cls. If not given the whole output of FUN is returned with a warning.

Cls is a [1:N] numerical vector of of numbers 1:k of the k clusters labeling the data points to the clusters.

Value

if Option1

List of

Cls_Matrix

[1:N,1:NumberOfTrials] numerical matrix consisting of columns, in which each column represents a Cls vector defining the clustering.

ComputationTime

[1:N] numerical vector of the computation time in seconds

Seeds

[1:N] seeds used for every trial, if set otherwise NULL

if Option2: a list of the list defined above named with the names of the DataOrDistance list.

Author(s)

Michael Thrun

References

Thrun, M. C.: Projection-Based Clustering through Self-Organization and Swarm Intelligence, Springer, Heidelberg, ISBN: 978-3658205393, 2018.

Thrun, M. C., & Ultsch, A.: Swarm Intelligence for Self-Organized Clustering, Journal of Artificial Intelligence, in press, 2020.

See Also

clusterApply


Mthrun/FCPS documentation built on June 28, 2023, 9:29 a.m.