siml_mat | R Documentation |
This functions returns a matrix of the similarity or distance. The similarity measure is selected from "Jaccard", "Simpson", "smc", and "Dice". If an argument of 'dat' consisting numeric values, select from "Tanimoto" or "cossine". These measured from all combinations of list or columns of data frame consisting character vectors.
siml_mat(dat, method, vorder, type, mtype)
dat |
A list of vectors, data.frame or matrix |
method |
One of the "jaccard", "simpson", "dice" , "smc", "tanimoto", and "cosine". In case of continuous values, use "tanimoto" as extende jaccard or "cosine". |
vorder |
logical: The element of 'dat' consists of an orderd vector or not. |
type |
Chose output matrix type "distance" or "similarity". The default value is "distance". |
mtype |
Chose output matrix type "lower", "upper", "both". The default value is "lower". |
Matrix of similarities or distances (Dij = 1-Sij), and sparse matrix
# A list which composed of character vectors v1 = c("a", "b", "c", "d") v2 = c("b", "d", "a", "c") v3 = c("a", "b", "c", "e") v4 = c("b", "c", "e", "f") v5 = c("f", "h", "i", "j") v6 = c("a", "e", "f", "g") x <- list(v1,v2,v3,v4,v5,v6); x <- setNames(x, paste0("v",1:6)) siml.jc <- siml_mat(dat = x, method = "jaccard", type = "similarity") siml.dc <- siml_mat(dat = x, method = "dice", type = "similarity") dist.jc <- siml_mat(dat = x, method = "jaccard", type = "distance") plot(hclust(as.dist(dist.jc$jaccard_distance))) # A list of different length y <- list(v1 = head(v1, 3), v2 = v2, v3 = head(v3, 3), v4 = v4, v5 = v5, v6 = v6) dist2.smp <- siml_mat(dat = y, method = "simpson", vorder = FALSE, type = "distance") # A list of ordered categorical vector z <- lapply(1:6, function(i) setNames(sample(c("A","B","C"), 4, replace = TRUE), 1:4)) dist2.smc <- siml_mat(dat = z, method = "jaccard", vorder = TRUE, type = "distance") # A data frame which composed of character vectors # In case of this data frame or list is converted to sparse matrix. dat1 <- as.data.frame(x, stringsAsFactors = FALSE) names(dat1) <- c("v1", "v2", "v3", "v4", "v5", "v6") dist.jc2 <- siml_mat(dat = dat1, method = "jaccard", vorder = FALSE, type = "distance") identical(dist.jc, dist.jc2) # A data frame which composed of numeric vectors dat2 <- as.data.frame(t(iris[-5]), strngsAsFactors = FALSE) names(dat2) <- paste0(iris$Species, 1:150) dist_tani <- siml_mat(dat = dat2, method = "tanimoto", type = "distance") # Convert a lower triangular matrix 'as.dist' and hieralchical clustaring plot(hclust(as.dist(dist.jc[[1]])))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.