simi: Similarity matrix (simi)
In thectar: Hermeneutic Content Analysis

Description Usage Arguments Details Value See Also Examples

simi calculates a similarity matrix for co-occurrence data.

1 2	simi(data, method = c("sort", "aggregate", "dichotomize", "as", "jaccard", "cosine", "inclusion"), single = TRUE, comments = TRUE)

`data`	Dataset; the first column must be the ID of the unit of comparison and all other columns must be categories.
`method`	Specifies the output, choose between "`sort`" (sorted version of the data), "`aggregate`" (aggregated version of the data), "`dichotomize`" (dichotomized version of the data), "`as`" (similarity matrix using Association Strength Index), "`jaccard`" (similarity matrix using Jaccard Index), "`cosine`" (similarity matrix using Cosine Index), and "`inclusion`" (similarity matrix using Inclusion Index). Default is `sort`.
`single`	If `TRUE`, single mentionings (i.e. one respondent mentioning just one category) are included. Default is `TRUE`.
`comments`	If `TRUE`, comments relating to exclusion or possible exclusion of categories and respondents are displayed. Default is `TRUE`.

This function applies to co-occurrence data. It calculates a similarity matrix using one of the following indices: Association Strength, Jaccard, Cosine, or Inclusion (for a detailed discussion see van Eck & Waltman, 2009, <doi:10.1002/asi.21075>). Additionally, the function can also generate a sorted, aggregated, or dichotomized version of the input data table. The first column of the input matrix should contain the ID of the unit of comparison, and the following columns the categories for which the similarity is calculated. Lines belonging to the same unit of comparison (i.e. same ID) will be combined. simi is particularly suitable for not sorted, not aggregated, or not dichotomized datasets. For datasets already sorted, aggregated, and dichotomized, the package proxy of Meyer and Buchta offers an alternative to calculate similarity matrices. simi does not work with missing data.

Sorted, aggregated, or dichotomized dataset, or similarity matrix.

dist from the package 'proxy for alternative ways to calculate similarity matrices; van Eck and Waltman (2009, <doi:10.1002/asi.21075>) for a detailed discussion on similaritiy measues.

## Calculate similarities using a dichotomized dataset
data(SDG_coocurrence)
SDG_coocurrence <- SDG_coocurrence[,-2] # Drop second column
similarity <- simi(SDG_coocurrence, method = "as", comments = FALSE)
head(similarity)