| corp_cooccurrence | R Documentation |
Calculates co-occurrence counts. For each co-occurrence the maximum possible number of co-occurrences is also calculated.
corp_surface(text, span, nodes = NULL, collocates = NULL) is.corp_cooccurrence(obj) is.corp_surface(obj) # deprecated surface(x, span, nodes = NULL, collocates = NULL)
text |
A |
span |
A character string defining the co-occurrence span. See Details. |
nodes |
A |
collocates |
A |
obj |
A |
x |
In the deprecated |
‘surface’ co-occurrence is easiest to describe with an example.
The following is a span of '2LR', that is 2 to the left and
2 to the right.
("a", "man", "a", "plan", "a", "cat", "a", "canal", "panama")
|___________|____|___________|
In this example the node “plan” would co-occur once each with the collocates “man” and “cat”, and twice with the collocate “a”.
Other examples of span:
span = '1L2R'
("a", "man", "a", "plan", "a", "cat", "a", "canal", "panama")
|____|____|___________|
span = '2R'
("a", "man", "a", "plan", "a", "cat", "a", "canal", "panama")
|____|___________|
For a detailed description of ‘surface’ co-occurrence see Evert (2008).
NAs can be used to implement co-occurrence barriers
eg if two NA characters are inserted
into x at each sentence boundary then with span = 2
co-occurrences will not happen across sentences.
See Evert (2008) for detailed description of co-occurrence barriers.
Returns a corp_surface object.
The corp_surface object can be interrogated using the
corp_get_* accessor functions.
The corp_surface objects are used as arguments to the
corp_coco) function.
S. Evert (2008) Corpora and collocations. Corpus Linguistics: An International Handbook 1212–1248.
corp_coco) and
corp_concordance).
# =====================
# surface co-occurrence
# =====================
x <- corp_text("A man, a plan, a canal -- Panama!")
y <- corp_surface(x, span = "2R")
corp_get_counts(y)
## x y H M
## 1: a a 2 4
## 2: a canal 1 5
## 3: a man 1 5
## 4: a panama 1 5
## 5: a plan 1 5
## 6: canal panama 1 0
## 7: man a 1 1
## 8: man plan 1 1
## 9: plan a 1 1
## 10: plan canal 1 1
# filter on nodes
y <- corp_surface(x, span = '2R', nodes = c("canal", "man", "plan"))
corp_get_counts(y)
## x y H M
## 1: canal panama 1 0
## 2: man a 1 1
## 3: man plan 1 1
## 4: plan a 1 1
## 5: plan canal 1 1
# filter on nodes and collocates
y <- corp_surface(x, span = '2R', nodes = c("canal", "man", "plan"),
collocates = c("panama", "a"))
corp_get_counts(y)
## x y H M
## 1: canal panama 1 0
## 2: man a 1 1
## 3: plan a 1 1
# co-occurrence barrier
tokens_with_barrier <- data.frame(
type = c("a", "man", "a", "plan", NA, NA, "a", "canal", "panama"),
start = as.integer(c( 1, 3, 8, 10, NA, NA, 16, 18, 27)),
end = as.integer(c( 1, 5, 8, 13, NA, NA, 16, 22, 32)),
stringsAsFactors = FALSE
)
x <- corp_text("A man, a plan, a canal -- Panama!", tokens = tokens_with_barrier)
y <- corp_surface(x, span = '2R')
corp_get_counts(y)
# x y H M
# 1: a a 1 4
# 2: a canal 1 4
# 3: a man 1 4
# 4: a panama 1 4
# 5: a plan 1 4
# 6: canal panama 1 0
# 7: man a 1 1
# 8: man plan 1 1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.