content.similarity.graph: Create a content similarity graph

Description Usage Arguments Value

Description

Makes a document similarity graph in the igraph format, based on a matrix in which rows are documents, and columns are content characteristics (e.g., terms, issues, topics). Vertices (i.e. nodes) are defined as unique combinations of vertex.grouping.vars. If vertices cover multiple documents (e.g., authors of several documents) then content characteristics are first aggregated. The (aggregated) content characteristics are used to calculate the similarities between vertices, for which various similarity measures can be used

Usage

1
2
3
content.similarity.graph(m, vertex.grouping.vars,
  similarity.measure = "cosine", min.similarity = NULL,
  content.totals.as.vertexmeta = NULL, content.totals.relative = T)

Arguments

m

A (sparse) matrix where rows are documents (e.g., news articles, forum posts) and columns are content characteristics (e.g., terms, topics). Values represent the presence of content characteristics within documents. Examples are: a DocumentTermMatrix or the transposed $document_sums (topics by documents matrix) created with lda.collapsed.gibbs.sampler.

vertex.grouping.vars

A data.frame or list with named vectors representing vertex characteristics. Each unique combination of characteristics will be considered a vertex. In the graph object these characteristics are stored as vertex attributes.

similarity.measure

A character string giving a method for computing similarity. Options are: 'correlation', 'cosine','conditional_probability','overlap_count' and 'overlap_jacard'.

min.similarity

A numeric scalar representing the threshold for similarities. All ties with a value below min.similarity will be deleted. Can be used to reduce the size of large graphs with many weak ties.

content.totals.as.vertexmeta

Can be used to include the sum values per node per content characteristic. if 'all', all content characteristics will be included. Can also be a numeric vector to select specific content characteristics. The attribute names for content characteristics are C followed by the number (C1, C2, etc)

content.totals.relative

Logical. If content.totals.as.vertexmeta is used, use relative attention for content characteristic

Value

A graph object in the igraph format


kasperwelbers/network-tools documentation built on May 20, 2019, 7:38 a.m.