train.hdgsom: Train a Growing Self-Organizing Map
In alecuba16/HDGSOM: High Dimensionality Growing Self-Organizing Maps

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/train.hdgsom.r

Computes a growing self-organizing map for mapping high-dimendional data to 2D.

1
2
3

  train.hdgsom(data, spreadFactor=0.8, keepdata=FALSE, 
      iterations=50, alpha=0.9, beta = 0.5, gridsize = FALSE, 
      nhood= "rect", initrad = NULL, ...)

`data`	a matrix or data.frame, with each row representing an observation and each column a dimension.
`spreadFactor`	the spread factor determines the rate with which new units are added to the map. Values close to 0 lead to few growth and therefore less nodes thatn values close to 1. The default value is 0.9.
`keepdata`	if set to TRUE, a copy of the traindata will be stored in the hdgsom object.
`iterations`	number of times that the dataframe will be presented to the network. (Growing and Smoothing Phase combined)
`alpha`	discount factor for the learning rate during the growing phase of the training. Values should be between 0 and 1.
`beta`	propagation rate. Determines the rate at which the error of a node, that cannot grow any nodes, is passed on to its neighbours. Suggested values may range between 0 and 1.
`gridsize`	default value is FALSE. If a nummeric value is entered, the grid-size of the network will be preditermined as a square with length of gridsize. No growth of the network will take place in this case.
`nhood`	define how the grid will be built, and how the neighbourhood will look like consequently. Allowed values are "rect" (rectangular) and "hex" (hexagonal).
`initrad`	if the gridsize is predetermined, the initial radius can be chosen here. If left blank, the square root of gridsize will be taken. A larger initrad can increase the quality of the clustering. However, the script can get very slow when a too large value is chosen. (number of necessary computations rises exponentially)
`...`	not used.

Euclidean distance is used in order to calculate the distances.

an S3 object of class "hdgsom" with components:

`nodes$position`	the location of the nodes on the map.
`nodes$codes`	codes that were established during the training for each node and dimension of the data.
`nodes$distance`	average distance of observations from their best matching units, per unit.
`nodes$freq`	how many times each node was the best matching unit during the last iteration.
`training`	reports on the progress of the training after each iteration over the data. "training_stage" indicates whether the algotithm was in the growing phase (1) or the smoothing phase (2), "meandist" records the average distance to the best matching unit, "num_of_nodes" stores the size of the map, and "nodegrowth" keeps track of the nodes grown during each iteration.
`GT`	the accumulated error per node that was required to grow a new unit during the training phase of the network.
`norm_param`	parameters which were used for each dimension to normalize the data. Needed for plotting the correct scale and to map future data.
`data`	data that was used to train the model, not normalized.

In contrast to the paper the algorighm is based on the following adjustments have been made:

1) Changing the Formula to calculate the growth-rate from the SpreadingFactor, in order to take into consideration the number of observations

2) The learning rate is reduced according to the formula used for regualr Kohonen networks during phase tow, in order to prevent too fast depreciation.

The decrease of neighbourhood size only takes place in phase two and happens on a linear basis from the specified start value to 0.

Alex Hunziker

Damminda Alahakoon, Saman K. Halgamuge (2000): Dynamic Self-Organizing Maps with Controlled Growth for Knowledge Discovery. IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11.

train_xy.hdgsom, map.hdgsom, plot.hdgsom

# load data
data(iris)
iris <- iris[,1:4]

# Train hdgsom Model
hdgsom_iris <- train.hdgsom(iris)

# Some more Parameters
hdgsom_iris <- train.hdgsom(iris, spreadFactor=0.8, keepdata=TRUE, iterations=30, 
  alpha=0.5, gridsize = FALSE, nhood= "rect")

# Fixed Grid size
hdgsom_iris <- train.hdgsom(iris, iterations=30, gridsize = 10)