coll_analysis | R Documentation |
Calculates common association measures used to perform collocation or collostruction analysis for typical count data.
coll_analysis(.x, ...) ## S3 method for class 'data.frame' coll_analysis( .x, o11 = NULL, f1 = NULL, f2 = NULL, n = NULL, fun = "ll", flip = NULL, ... ) ## S3 method for class 'matrix' coll_analysis(.x, f2 = NULL, n = NULL, fun = "ll", flip = NULL, ...) ## Default S3 method: coll_analysis(.x, o11, f1, f2 = NULL, n = NULL, fun = "ll", flip = NULL, ...)
.x |
data.frame or list containing data |
... |
further arguments to be passed to or from other methods |
o11 |
numeric: joint frequencies |
f1 |
numeric: corpus frequencies of the word |
f2 |
numeric of length 1 or equal to o11: corpus frequencies of co-occurring structure; if omitted, sum of o11 is used |
n |
numeric of length 1 or equal to o11: corpus or sample size; if
omitted, |
fun |
character vector or named list containing character, function or expression elements: for built-in measures (see Details). |
flip |
character: names of measures for which to flip the sign for cases with negative association, intended for two-sided measures |
For collocation analysis, f1 and f2 typically represent the corpus
frequencies of the word and the collocate, respectively, i.e. frequencies of
co-occurrence included. For collostruction analysis, f1 represents the corpus
frequencies of the word, and f2 the construction frequency. In a contingency
table, they represent marginal sums.
Both the construction frequency f2
and the corpus size n
can be provided
as vectors, which allows for efficient calculations over data from multiple
constructions/corpora.
For data.frame input, the values for "o11", "f1", "f2", "n" can either be provided explicitly as expression or character argument or implicitly by column name. It is recommended to pass the columns explicitly.
Matrix input currently requires column names "o11", "f1", "f2", "n"
an object similar to .x with one result per column for the
association measures specified in fun
; row names in matrices and character
or factor columns in data.frames are preserved
data(adjective_cooccurrence) .x <- subset(adjective_cooccurrence, word != collocate) n <- attr(adjective_cooccurrence, "corpus_size") res <- coll_analysis(.x, o11, f1, f2, n, fun = "ll") res[order(res$ll, decreasing = TRUE), ] |> head() # if arguments match column names, they can be used explicitly c("o11", "f1", "f2") %in% names(.x) # TRUE coll_analysis(.x, n = n, fun = "ll") |> head() # control names of output columns by using a named list coll_analysis(.x, o11, f1, f2, n, fun = list(logl = "ll")) |> head() # using custom function mi_base2 <- \(o11, e11) log2(o11 / e11) coll_analysis(.x, o11, f1, f2, n, fun = mi_base2) |> head() # mix built-in measures with custom functions coll_analysis(.x, n = n, fun = list(builtin = "ll", custom = mi_base2)) |> head()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.