Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/clValidfunctions.R
Computes a Selforganizing Tree Algorithm (SOTA) clustering of a dataset returning a SOTA object.
1 2 3 
data 
data matrix or data frame. Cannot have a profile ID as the first column. 
maxCycles 
integer value representing the maximum number of iterations allowed. The resulting number
of clusters returned by 
maxEpochs 
integer value indicating the maximum number of training epochs allowed per cycle. By default,

distance 
character string used to represent the metric to be used for calculating dissimilarities between profiles. 'euclidean' is the default, with 'correlation' being another option. 
wcell 
value specifying the winning cell migration weight. The default is 0.01. 
pcell 
value specifying the parent cell migration weight. The default is 0.005. 
scell 
value specifying the sister cell migration weight. The default is 0.001. 
delta 
value specifying the minimum epoch error improvement. This value is used as a threshold for signaling the start of a new cycle. It is set to 1e04 by default. 
neighb.level 
integer value used to indicate which cells are candidates to accept new profiles. This number specifies the number of levels up the tree the algorithm moves in the search of candidate cells for the redistribution of profiles. The default is 0. 
maxDiversity 
value representing a maximum variability allowed within a cluster. 0.9 is the default value. 
unrest.growth 
logical flag: if TRUE then the algorithm will run 
... 
Any other arguments. 
The SelfOrganizing Tree Algorithm (SOTA) is an unsupervised neural network with a binary tree topology. It combines
the advantages of both hierarchical clustering and SelfOrganizing Maps (SOM). The algorithm picks a node with
the largest Diversity and splits it into two nodes, called Cells. This process can be stopped at any level, assuring a fixed number of
hard clusters. This behavior is achieved with setting the unrest.growth
parameter to TRUE. Growth of the
tree can be stopped based on other criteria, like the allowed maximum Diversity within the cluster and so on.
Further details regarding the inner workings of the algorithm can be found in the paper listed in the Reference section.
data 
data matrix used for clustering 
c.tree 
complete tree in a matrix format. Node ID, its Ancestor, and whether it's a terminal node (cell) are listed in the first three columns. Node profiles are shown in the remaining columns. 
tree 
incomplete tree in a matrix format listing only the terminal nodes (cells). Node ID, its Ancestor, and 1's for a cell indicator are listed in the first three columns. Node profiles are shown in the remaining columns. 
clust 
integer vector whose length is equal to the number of profiles in a data matrix indicating the cluster assingments for each profile in the original order. 
totals 
integer vector specifying the cluster sizes. 
dist 
character string indicating a distance function used in the clustering process. 
diversity 
vector specifying final cluster diverisities. 
Vasyl Pihur, Guy Brock, Susmita Datta, Somnath Datta
Herrero, J., Valencia, A, and Dopazo, J. (2005). A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics, 17, 126136.
1 2 3 4 5 6 7 8 9 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.