topic_cooccurrences: Get topic cooccurrences.

Description Usage Arguments Value Examples

Description

Topic models describe documents as composed of different topics. This property can be used to obtain co-occuurrence statistics of topics.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
## S4 method for signature 'TopicModel'
cooccurrences(
  .Object,
  k,
  regex = NULL,
  docs = NULL,
  renumber = NULL,
  method = "chisquare",
  progress = TRUE,
  verbose = TRUE
)

## S4 method for signature 'matrix'
cooccurrences(
  .Object,
  regex = NULL,
  docs = NULL,
  renumber = NULL,
  method = "chisquare",
  progress = TRUE,
  verbose = TRUE
)

Arguments

.Object

Either an object inheriting from the TopicModel class (such as LDA_Gibbs), or a matrix with the topics present in documents. If .Object is a matrix, each column is expected to represent the top k topics present in a document. The matrix returned by the topics-method from the topicmodels package was used to develop the method, but document-term-matrices derived otherwise tools may work as well.

k

An integer value, the k first topics to consider when deriving the document-topic-matrix from a trained topicmodel.

regex

If not NULL (default), the procedure will be limited to document names matched by the regular expression stated by regex.

docs

If not NULL, the procedure will be limited to documents matching the character string.

renumber

If not NULL (the default), topics in the document-topic-matrix will be renumbered according to the argument renumber. If renumber is an integer vector, the length of this vector is required to match the number of topics in topic model. Each topic i present in the document-topic matrix will be mapped on the value at position i of the vector. If renumber is a list of integer vectors, these vectors are considered as groups of topics that represent a single implict "super-topic". For each integer vector present in the list, the topic numbers present in the document-topic-matrix will be matched on the first value of the vector.

method

The statistic to calculate co-occurrences, "chisquare" by default.

progress

A logical value, whether to show a progress bar.

verbose

A logical value, whether to output messages on the state of affairs.

Value

A data.table with co-occurrence statistics with at least the following columns:

a

number of the topic of interest

b

number of the co-occurring topic

b_total

number of total occurrences of topic b; if the document-topic matrix has been renumbered, the times at least one of the topics in a group occurs in a docuent

b_total

number of total occurrences of topic a; if the document-topic matrix has been renumbered, the times at least one of the topics in a group occurs in a docuent

count_coi

number of joint occurrences of topics a and b

count_ref

number of occurrences of b without co-occurring of a

If argument method is not NULL, additional columns will be included in the topic co-occurrence table. E.g. if method is "chisquare", a column "exp_coi", will report the expected number of occurrences of b together with a, column "chisquare" will report the value of the chi squared test, and a column "rank_chisquare" will report the rank of the statistical significance of the co-occurrence of a and b according to the chi squared test.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
data(BE_lda, BE_labels)
dt <- cooccurrences(BE_lda, k = 3L)
topics_to_drop <- grep("^\\(.*?\\)$", BE_labels)
dt_min <- dt[chisquare >= 10.83][!a %in% topics_to_drop][!b %in% topics_to_drop]
dt_min[, "a_label" := BE_labels[ dt_min[["a"]] ] ]
dt_min[, "b_label" := BE_labels[ dt_min[["b"]] ] ]

# Using the cooccurrence data for generating a network visualisation
if (requireNamespace("igraph")){
g <- igraph::graph_from_data_frame(
  d = data.frame(
    from = dt_min[["a_label"]],
    to = dt_min[["b_label"]],
    n = dt_min[["count_coi"]],
    stringsAsFactors = FALSE
  ),
  directed = TRUE
)
g <- igraph::as.undirected(g, mode = "collapse")
if (interactive()){
  igraph::plot.igraph(
    g, shape = "square", vertex.color = "steelblue",
    label = igraph::V(g)$name, label.family = 11, label.cex = 0.5
  )
}
}

# Example how to use the argument 'renumber' if a concept is represented by 
# several topics 
renumber_li <- list(
  school = grep("Grundschule", BE_labels),
  cummunity = grep("Gemeindeentwicklung", BE_labels),
  traffic = grep("Verkehrsmittel", BE_labels)
)
dt <- cooccurrences(BE_lda, k = 3L, renumber = renumber_li)
dt[a == grep("Grundschule", BE_labels)[1]][chisquare > 10.83]

PolMine/polmineR.topics documentation built on March 6, 2020, 6:03 p.m.