KODAMA.matrix | R Documentation |
KODAMA (KnOwledge Discovery by Accuracy MAximization) is an unsupervised and semi-supervised learning algorithm that performs feature extraction from noisy and high-dimensional data.
KODAMA.matrix (data,
spatial = NULL,
samples = NULL,
M = 100, Tcycle = 20,
FUN = c("fastpls","simpls"),
ncomp = min(c(50,ncol(data))),
W = NULL, metrics="euclidean",
constrain = NULL, fix = NULL, landmarks = 10000,
splitting = ifelse(nrow(data) < 40000, 100, 300),
spatial.resolution = 0.3 ,
simm_dissimilarity_matrix=FALSE,
seed=1234)
data |
A numeric matrix where rows are samples and columns are variables. |
spatial |
Optional matrix of spatial coordinates or NULL. Used to apply spatial constraints. |
samples |
An optional vector indicating the identity for each sample. Can be used to guide the integration of prior sample-level information. |
M |
Number of iterative processes. |
Tcycle |
Number of cycles to optimize cross-validated accuracy. |
FUN |
Classifier to be used. Options are |
ncomp |
Number of components for the PLS classifier. Default is |
W |
A vector of initial class labels for each sample ( |
metrics |
Distance metric to be used (default is |
constrain |
An optional vector indicating group constraints. Samples sharing the same value in this vector will be forced to stay in the same cluster. |
fix |
A logical vector indicating whether each sample's label in |
landmarks |
Number of landmark points used to approximate the similarity structure. The default is 10000. |
splitting |
Number of random sample splits used during optimization. The default is 100 for small datasets (<40000 samples) and 300 otherwise. |
spatial.resolution |
A numeric value (default 0.3) controlling the resolution of spatial constraints. |
simm_dissimilarity_matrix |
Logical. If |
seed |
Random seed for reproducibility. The default is 1234. |
KODAMA consists of five steps. These can be in turn divided into two parts: (i) the maximization of cross-validated accuracy by an iterative process (step I and II), resulting in the construction of a proximity matrix (step III), and (ii) the definition of a dissimilarity matrix (step IV and V). The first part entails the core idea of KODAMA, that is, the partitioning of data guided by the maximization of the cross-validated accuracy. At the beginning of this part, a fraction of the total samples (defined by FUN_SAM
) are randomly selected from the original data. The whole iterative process (step I-III) is repeated M
times to average the effects owing to the randomness of the iterative procedure. Each time that this part is repeated, a different fraction of samples is selected. The second part aims at collecting and processing these results by constructing a dissimilarity matrix to provide a holistic view of the data while maintaining their intrinsic structure (steps IV and V). Then, KODAMA.visualization
function is used to visualise the results of KODAMA dissimilarity matrix.
The function returns a list with 4 items:
dissimilarity |
a dissimilarity matrix. |
acc |
a vector with the |
proximity |
a proximity matrix. |
v |
a matrix containing all classifications obtained maximizing the cross-validation accuracy. |
res |
a matrix containing all classification vectors obtained through maximizing the cross-validation accuracy. |
knn_Rnanoflann |
dissimilarity matrix used as input for the |
data |
original data. |
res_constrain |
the constrins used. |
Stefano Cacciatore and Leonardo Tenori
Abdel-Shafy EA, Kassim M, Vignol A, et al.
KODAMA enables self-guided weakly supervised learning in spatial transcriptomics.
bioRxiv 2025. doi: 10.1101/2025.05.28.656544. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1101/2025.05.28.656544")}
Cacciatore S, Luchinat C, Tenori L
Knowledge discovery by accuracy maximization.
Proc Natl Acad Sci U S A 2014;111(14):5117-5122. doi: 10.1073/pnas.1220873111. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1073/pnas.1220873111")}
Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA
KODAMA: an updated R package for knowledge discovery and data mining.
Bioinformatics 2017;33(4):621-623. doi: 10.1093/bioinformatics/btw705. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/bioinformatics/btw705")}
L.J.P. van der Maaten and G.E. Hinton.
Visualizing High-Dimensional Data Using t-SNE.
Journal of Machine Learning Research 9 (Nov): 2579-2605, 2008.
L.J.P. van der Maaten.
Learning a Parametric Embedding by Preserving Local Structure.
In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR W&CP 5:384-391, 2009.
McInnes L, Healy J, Melville J.
Umap: Uniform manifold approximation and projection for dimension reduction.
arXiv preprint:1802.03426. 2018 Feb 9.
KODAMA.visualization
data(iris)
data=iris[,-5]
labels=iris[,5]
kk=KODAMA.matrix(data,ncomp=2)
cc=KODAMA.visualization(kk,"t-SNE")
plot(cc,col=as.numeric(labels),cex=2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.