siml_mat: Measure of the similarity and create of similarity or...

View source: R/siml_mat.R

siml_matR Documentation

Measure of the similarity and create of similarity or distance matrix

Description

This functions returns a matrix of the similarity or distance. The similarity measure is selected from "Jaccard", "Simpson", "smc", and "Dice". If an argument of 'dat' consisting numeric values, select from "Tanimoto" or "cossine". These measured from all combinations of list or columns of data frame consisting character vectors.

Usage

siml_mat(dat, method, vorder, type, mtype)

Arguments

dat

A list of vectors, data.frame or matrix

method

One of the "jaccard", "simpson", "dice" , "smc", "tanimoto", and "cosine". In case of continuous values, use "tanimoto" as extende jaccard or "cosine".

vorder

logical: The element of 'dat' consists of an orderd vector or not.

type

Chose output matrix type "distance" or "similarity". The default value is "distance".

mtype

Chose output matrix type "lower", "upper", "both". The default value is "lower".

Value

Matrix of similarities or distances (Dij = 1-Sij), and sparse matrix

Examples

# A list which composed of character vectors
v1 = c("a", "b", "c", "d")
v2 = c("b", "d", "a", "c")
v3 = c("a", "b", "c", "e")
v4 = c("b", "c", "e", "f")
v5 = c("f", "h", "i", "j")
v6 = c("a", "e", "f", "g")
x <- list(v1,v2,v3,v4,v5,v6); x <- setNames(x, paste0("v",1:6))
siml.jc <- siml_mat(dat = x, method = "jaccard", type = "similarity")
siml.dc <- siml_mat(dat = x, method = "dice", type = "similarity")
dist.jc <- siml_mat(dat = x, method = "jaccard", type = "distance")
plot(hclust(as.dist(dist.jc$jaccard_distance)))

# A list of different length
y <- list(v1 = head(v1, 3), v2 = v2, v3 = head(v3, 3), v4 = v4, v5 = v5, v6 = v6)
dist2.smp <- siml_mat(dat = y, method = "simpson", vorder = FALSE, type = "distance")

# A list of ordered categorical vector
z <- lapply(1:6, function(i) setNames(sample(c("A","B","C"), 4, replace = TRUE), 1:4))
dist2.smc <- siml_mat(dat = z, method = "jaccard", vorder = TRUE, type = "distance")

# A data frame which composed of character vectors
# In case of this data frame or list is converted to sparse matrix.
dat1 <- as.data.frame(x, stringsAsFactors = FALSE)
names(dat1) <- c("v1", "v2", "v3", "v4", "v5", "v6")
dist.jc2 <- siml_mat(dat = dat1, method = "jaccard", vorder = FALSE, type = "distance")
identical(dist.jc, dist.jc2)

# A data frame which composed of numeric vectors
dat2 <- as.data.frame(t(iris[-5]), strngsAsFactors = FALSE)
names(dat2) <- paste0(iris$Species, 1:150)
dist_tani <- siml_mat(dat = dat2, method = "tanimoto", type = "distance")

# Convert a lower triangular matrix 'as.dist' and hieralchical clustaring
plot(hclust(as.dist(dist.jc[[1]])))

shkonishi/rsko documentation built on Feb. 21, 2023, 5:12 a.m.