View source: R/co_occurrence.R
| cooccurrence | R Documentation |
Constructs an undirected co-occurrence network from various input formats. Entities that appear together in the same transaction, document, or record are connected, with edge weights reflecting raw counts or a similarity measure. Argument names follow the citenets convention.
cooccurrence(
data,
field = NULL,
by = NULL,
sep = NULL,
similarity = c("none", "jaccard", "cosine", "inclusion", "association", "dice",
"equivalence", "relative"),
threshold = 0,
min_occur = 1L,
diagonal = TRUE,
top_n = NULL,
...
)
data |
Input data. Accepts:
|
field |
Character. The entity column — determines what the nodes are.
For delimited format, a single column whose values are split by |
by |
Character or |
sep |
Character or |
similarity |
Character. Similarity measure applied to the raw co-occurrence counts. One of:
|
threshold |
Numeric. Minimum edge weight to retain. Edges below this value are set to zero. Applied after similarity normalization. Default 0. |
min_occur |
Integer. Minimum entity frequency (number of transactions an entity must appear in). Entities below this threshold are dropped before computing co-occurrence. Default 1 (keep all). |
diagonal |
Logical. If |
top_n |
Integer or |
... |
Currently unused. |
Six input formats are supported, auto-detected from the combination of
field, by, and sep:
Delimited: field + sep (single column).
Each cell is split by sep, trimmed, and de-duplicated per row.
Multi-column delimited: field (vector) + sep.
Values from multiple columns are split, pooled, and de-duplicated per row.
Long bipartite: field + by.
Groups by by; unique values of field within each group
form a transaction.
Binary matrix: No field/by/sep, all
values 0/1. Columns are items, rows are transactions.
Wide sequence: No field/by/sep,
non-binary. Unique values across each row form a transaction.
List: A plain list of character vectors.
The pipeline converts all formats into a list of character vectors
(transactions), optionally filters by min_occur, builds a binary
transaction matrix, computes crossprod(B) for the raw co-occurrence
counts, normalizes via the chosen similarity, then applies
threshold and top_n filtering.
A netobject (undirected) with method = "co_occurrence_fn".
The $weights matrix contains similarity (or raw) co-occurrence values.
The $params list stores the similarity method, threshold, and
the number of transactions.
van Eck, N. J., & Waltman, L. (2009). How to normalize co-occurrence data? An analysis of some well-known similarity measures. Journal of the American Society for Information Science and Technology, 60(8), 1635–1651.
build_cna for sequence-positional co-occurrence via
build_network().
# Delimited field (e.g., keyword co-occurrence)
df <- data.frame(
id = 1:4,
keywords = c("network; graph", "graph; matrix; network",
"matrix; algebra", "network; algebra; graph")
)
net <- cooccurrence(df, field = "keywords", sep = ";")
# Long/bipartite
long_df <- data.frame(
paper = c(1, 1, 1, 2, 2, 3, 3),
keyword = c("network", "graph", "matrix", "graph", "algebra",
"network", "algebra")
)
net <- cooccurrence(long_df, field = "keyword", by = "paper")
# List of transactions
transactions <- list(c("A", "B"), c("B", "C"), c("A", "B", "C"))
net <- cooccurrence(transactions, similarity = "jaccard")
# Binary matrix
bin <- matrix(c(1,0,1, 1,1,0, 0,1,1), nrow = 3, byrow = TRUE,
dimnames = list(NULL, c("X", "Y", "Z")))
net <- cooccurrence(bin)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.