WH_2d_Kohonen_maps: Batch Kohonen self-organizing 2d maps for histogram-valued...

View source: R/Kohonen_maps.R

WH_2d_Kohonen_mapsR Documentation

Batch Kohonen self-organizing 2d maps for histogram-valued data

Description

The function implements a Batch Kohonen self-organizing 2d maps algorithm for histogram-valued data.

Usage

WH_2d_Kohonen_maps(
  x,
  net = list(xdim = 4, ydim = 3, topo = c("rectangular")),
  kern.param = 2,
  TMAX = 2,
  Tmin = 0.2,
  niter = 30,
  repetitions = 5,
  simplify = FALSE,
  qua = 10,
  standardize = FALSE,
  verbose = FALSE
)

Arguments

x

A MatH object (a matrix of distributionH).

net

a list describing the topology of the net list(xdim=number of rows, ydim=numbers of columns,topo=c('rectangular' or 'hexagonal')), see somgrid sintax in packageclass default net=list(xdim=4,ydim=3,topo=c('rectangular'))

kern.param

(default =2) the kernel parameter for the RBF kernel used in the algorithm

TMAX

a parameter useful for the iterations (default=2)

Tmin

a parameter useful for the iterations (default=0.2)

niter

maximum number of iterations (default=30)

repetitions

number of repetion of the algorithm (default=5), beacuase each launch may generate a local optimum

simplify

a logical parameter for speeding up computations (default=FALSE). If true data are recoded in order to have fast computations

qua

if simplify=TRUE number of equally spaced quantiles for recodify the histograms (default=10)

standardize

A logic value (default is FALSE). If TRUE, histogram-valued data are standardized, variable by variable, using the Wassertein based standard deviation. Use if one wants to have variables with std equal to one.

verbose

a logical parameter (default=FALSE). If TRUE details of computation are shown during the execution.

Details

An extension of Batch Self Organised Map (BSOM) is here proposed for histogram data. These kind of data have been defined in the context of symbolic data analysis. The BSOM cost function is then based on a distance function: the L2 Wasserstein distance. This distance has been widely proposed in several techniques of analysis (clustering, regression) when input data are expressed by distributions (empirical by histograms or theoretical by probability distributions). The peculiarity of such distance is to be an Euclidean distance between quantile functions so that all the properties proved for L2 distances are verified again. An adaptative versions of BSOM is also introduced considering an automatic system of weights in the cost function in order to take into account the different effect of the several variables in the Self-Organised Map grid.

Value

a list with the results of the Batch Kohonen map

Slots

solution

A list.Returns the best solution among the repetitionsetitions, i.e. the one having the minimum sum of squares criterion.

solution$MAP

The map topology.

solution$IDX

A vector. The clusters at which the objects are assigned.

solution$cardinality

A vector. The cardinality of each final cluster.

solution$proto

A MatH object with the description of centers.

solution$Crit

A number. The criterion (Sum od square deviation from the centers) value at the end of the run.

quality

A number. The percentage of Sum of square deviation explained by the model. (The higher the better)

References

Irpino A, Verde R, De Carvalho FAT (2012). Batch self organizing maps for interval and histogram data. In: Proceedings of COMPSTAT 2012. p. 143-154, ISI/IASC, ISBN: 978-90-73592-32-2

Examples

## Not run: 
results <- WH_2d_Kohonen_maps(
  x = BLOOD,
  net = list(xdim = 2, ydim = 3, topo = c("rectangular")),
  repetitions = 2, simplify = TRUE,
  qua = 10, standardize = TRUE
)

## End(Not run)

Airpino/HistDAWass documentation built on Jan. 30, 2024, 7:53 p.m.