corp_cooccurrence | R Documentation |
Calculates co-occurrence counts. For each co-occurrence the maximum possible number of co-occurrences is also calculated.
corp_surface(text, span, nodes = NULL, collocates = NULL) is.corp_cooccurrence(obj) is.corp_surface(obj) # deprecated surface(x, span, nodes = NULL, collocates = NULL)
text |
A |
span |
A character string defining the co-occurrence span. See Details. |
nodes |
A |
collocates |
A |
obj |
A |
x |
In the deprecated |
‘surface’ co-occurrence is easiest to describe with an example.
The following is a span
of '2LR'
, that is 2 to the left and
2 to the right.
("a", "man", "a", "plan", "a", "cat", "a", "canal", "panama") |___________|____|___________|
In this example the node “plan” would co-occur once each with the collocates “man” and “cat”, and twice with the collocate “a”.
Other examples of span
:
span = '1L2R'
("a", "man", "a", "plan", "a", "cat", "a", "canal", "panama") |____|____|___________|
span = '2R'
("a", "man", "a", "plan", "a", "cat", "a", "canal", "panama") |____|___________|
For a detailed description of ‘surface’ co-occurrence see Evert (2008).
NA
s can be used to implement co-occurrence barriers
eg if two NA
characters are inserted
into x at each sentence boundary then with span = 2
co-occurrences will not happen across sentences.
See Evert (2008) for detailed description of co-occurrence barriers.
Returns a corp_surface
object.
The corp_surface
object can be interrogated using the
corp_get_*
accessor functions.
The corp_surface
objects are used as arguments to the
corp_coco
) function.
S. Evert (2008) Corpora and collocations. Corpus Linguistics: An International Handbook 1212–1248.
corp_coco
) and
corp_concordance
).
# ===================== # surface co-occurrence # ===================== x <- corp_text("A man, a plan, a canal -- Panama!") y <- corp_surface(x, span = "2R") corp_get_counts(y) ## x y H M ## 1: a a 2 4 ## 2: a canal 1 5 ## 3: a man 1 5 ## 4: a panama 1 5 ## 5: a plan 1 5 ## 6: canal panama 1 0 ## 7: man a 1 1 ## 8: man plan 1 1 ## 9: plan a 1 1 ## 10: plan canal 1 1 # filter on nodes y <- corp_surface(x, span = '2R', nodes = c("canal", "man", "plan")) corp_get_counts(y) ## x y H M ## 1: canal panama 1 0 ## 2: man a 1 1 ## 3: man plan 1 1 ## 4: plan a 1 1 ## 5: plan canal 1 1 # filter on nodes and collocates y <- corp_surface(x, span = '2R', nodes = c("canal", "man", "plan"), collocates = c("panama", "a")) corp_get_counts(y) ## x y H M ## 1: canal panama 1 0 ## 2: man a 1 1 ## 3: plan a 1 1 # co-occurrence barrier tokens_with_barrier <- data.frame( type = c("a", "man", "a", "plan", NA, NA, "a", "canal", "panama"), start = as.integer(c( 1, 3, 8, 10, NA, NA, 16, 18, 27)), end = as.integer(c( 1, 5, 8, 13, NA, NA, 16, 22, 32)), stringsAsFactors = FALSE ) x <- corp_text("A man, a plan, a canal -- Panama!", tokens = tokens_with_barrier) y <- corp_surface(x, span = '2R') corp_get_counts(y) # x y H M # 1: a a 1 4 # 2: a canal 1 4 # 3: a man 1 4 # 4: a panama 1 4 # 5: a plan 1 4 # 6: canal panama 1 0 # 7: man a 1 1 # 8: man plan 1 1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.