simi: Similarity matrix (simi)

Description Usage Arguments Details Value See Also Examples

Description

simi calculates a similarity matrix for co-occurrence data.

Usage

1
2
simi(data, method = c("sort", "aggregate", "dichotomize", "as",
  "jaccard", "cosine", "inclusion"), single = TRUE, comments = TRUE)

Arguments

data

Dataset; the first column must be the ID of the unit of comparison and all other columns must be categories.

method

Specifies the output, choose between "sort" (sorted version of the data), "aggregate" (aggregated version of the data), "dichotomize" (dichotomized version of the data), "as" (similarity matrix using Association Strength Index), "jaccard" (similarity matrix using Jaccard Index), "cosine" (similarity matrix using Cosine Index), and "inclusion" (similarity matrix using Inclusion Index). Default is sort.

single

If TRUE, single mentionings (i.e. one respondent mentioning just one category) are included. Default is TRUE.

comments

If TRUE, comments relating to exclusion or possible exclusion of categories and respondents are displayed. Default is TRUE.

Details

This function applies to co-occurrence data. It calculates a similarity matrix using one of the following indices: Association Strength, Jaccard, Cosine, or Inclusion (for a detailed discussion see van Eck & Waltman, 2009, <doi:10.1002/asi.21075>). Additionally, the function can also generate a sorted, aggregated, or dichotomized version of the input data table. The first column of the input matrix should contain the ID of the unit of comparison, and the following columns the categories for which the similarity is calculated. Lines belonging to the same unit of comparison (i.e. same ID) will be combined. simi is particularly suitable for not sorted, not aggregated, or not dichotomized datasets. For datasets already sorted, aggregated, and dichotomized, the package proxy of Meyer and Buchta offers an alternative to calculate similarity matrices. simi does not work with missing data.

Value

Sorted, aggregated, or dichotomized dataset, or similarity matrix.

See Also

dist from the package 'proxy for alternative ways to calculate similarity matrices; van Eck and Waltman (2009, <doi:10.1002/asi.21075>) for a detailed discussion on similaritiy measues.

Examples

1
2
3
4
5
## Calculate similarities using a dichotomized dataset
data(SDG_coocurrence)
SDG_coocurrence <- SDG_coocurrence[,-2] # Drop second column
similarity <- simi(SDG_coocurrence, method = "as", comments = FALSE)
head(similarity)

thectar documentation built on Nov. 16, 2019, 1:07 a.m.