KODAMA.matrix | R Documentation |
KODAMA (KnOwledge Discovery by Accuracy MAximization) is an unsupervised and semi-supervised learning algorithm that performs feature extraction from noisy and high-dimensional data.
KODAMA.matrix (data, M = 100, Tcycle = 20, FUN_VAR = function(x) { ceiling(ncol(x)) }, FUN_SAM = function(x) { ceiling(nrow(x) * 0.75)}, bagging = FALSE, FUN = c("PLS-DA","KNN"), f.par = 5, W = NULL, constrain = NULL, fix=NULL, epsilon = 0.05, dims=2, landmarks=1000, neighbors=min(c(landmarks,nrow(data)))-1)
data |
a matrix. |
M |
number of iterative processes (step I-III). |
Tcycle |
number of iterative cycles that leads to the maximization of cross-validated accuracy. |
FUN_VAR |
function to select the number of variables to select randomly. By default all variable are taken. |
FUN_SAM |
function to select the number of samples to select randomly. By default the 75 per cent of all samples are taken. |
bagging |
Should sampling be with replacement, |
FUN |
classifier to be considered. Choices are " |
f.par |
parameters of the classifier. |
W |
a vector of |
constrain |
a vector of |
fix |
a vector of |
epsilon |
cut-off value for low proximity. High proximity are typical of intracluster relationships, whereas low proximities are expected for intercluster relationships. Very low proximities between samples are ignored by (default) setting |
dims |
dimensions of the configurations of t-SNE based on the KODAMA dissimilarity matrix. |
landmarks |
number of landmarks to use. |
neighbors |
number of neighbors to include in the dissimilarity matrix yo pass to the |
KODAMA consists of five steps. These can be in turn divided into two parts: (i) the maximization of cross-validated accuracy by an iterative process (step I and II), resulting in the construction of a proximity matrix (step III), and (ii) the definition of a dissimilarity matrix (step IV and V). The first part entails the core idea of KODAMA, that is, the partitioning of data guided by the maximization of the cross-validated accuracy. At the beginning of this part, a fraction of the total samples (defined by FUN_SAM
) are randomly selected from the original data. The whole iterative process (step I-III) is repeated M
times to average the effects owing to the randomness of the iterative procedure. Each time that this part is repeated, a different fraction of samples is selected. The second part aims at collecting and processing these results by constructing a dissimilarity matrix to provide a holistic view of the data while maintaining their intrinsic structure (steps IV and V). Then, KODAMA.visualization
function is used to visualise the results of KODAMA dissimilarity matrix.
The function returns a list with 4 items:
dissimilarity |
a dissimilarity matrix. |
acc |
a vector with the |
proximity |
a proximity matrix. |
v |
a matrix containing the all classification obtained maximizing the cross-validation accuracy. |
res |
a matrix containing all classification vectors obtained through maximizing the cross-validation accuracy. |
f.par |
parameters of the classifier.. |
entropy |
Shannon's entropy of the KODAMA proximity matrix. |
landpoints |
indexes of the landmarks used. |
data |
original data. |
knn_Armadillo |
dissimilarity matrix used as input for the |
Stefano Cacciatore and Leonardo Tenori
Cacciatore S, Luchinat C, Tenori L
Knowledge discovery by accuracy maximization.
Proc Natl Acad Sci U S A 2014;111(14):5117-22. doi: 10.1073/pnas.1220873111. Link
Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA
KODAMA: an updated R package for knowledge discovery and data mining.
Bioinformatics 2017;33(4):621-623. doi: 10.1093/bioinformatics/btw705. Link
L.J.P. van der Maaten and G.E. Hinton.
Visualizing High-Dimensional Data Using t-SNE.
Journal of Machine Learning Research 9 (Nov) : 2579-2605, 2008.
L.J.P. van der Maaten.
Learning a Parametric Embedding by Preserving Local Structure.
In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR W&CP 5:384-391, 2009.
McInnes L, Healy J, Melville J.
Umap: Uniform manifold approximation and projection for dimension reduction.
arXiv preprint:1802.03426. 2018 Feb 9.
KODAMA.visualization
data(iris) data=iris[,-5] labels=iris[,5] kk=KODAMA.matrix(data,FUN="KNN",f.par=2) cc=KODAMA.visualization(kk,"t-SNE") plot(cc,col=as.numeric(labels),cex=2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.