This data set lists 5102 frequent combinations of verbs and prepositional phrases (PP) extracted from a German newspaper corpus. The collocational status of each PP-verb combination was manually annotated by Brigitte Krenn (2000). In addition, pre-computed scores of several standard association measures are provided.
KrennPPV candidate set forms part of the data used in the evaluation study
of Evert \& Krenn (2005).
A data frame with 5102 rows and the following columns:
the prepositional phrase, represented by preposition and lemma of the nominal head (character).
Preposition-article fusion is indicated by a
+ sign. For example, the prepositional phrase
im letzten Jahr would appear as
in:Jahr in the data set.
the verb lemma (character). Separated particle verbs have been recombined.
whether the PP-verb combination is a lexical collocation (logical)
whether a PP-verb collocation is a support verb construction (logical)
whether a PP-verb-collocation is a figurative expression (logical)
co-occurrence frequency of the PP-verb combination within clauses (integer)
Mutual Information association measure
Dice coefficient association measure
z-score association measure
t-score association measure
chi-squared association measure (without Yates' continuity correction)
chi-squared association measure (with Yates' continuity correction)
log-likelihood association measure
Fisher's exact test as an association measure (negative logarithm of one-sided p-value)
See Evert (2008) and http://www.collocations.de/AM/ for details on these association measures.
Stefan Evert <email@example.com>
Evert, Stefan (2008). Corpora and collocations. In A. Lüdeling and M. Kytö (eds.), Corpus Linguistics. An International Handbook, chapter 58, pages 1212–1248. Mouton de Gruyter, Berlin, New York.
Evert, Stefan and Krenn, Brigitte (2005). Using small random samples for the manual evaluation of statistical association measures. Computer Speech and Language, 19(4), 450–466.
Krenn, Brigitte (2000). The Usual Suspects: Data-Oriented Models for the Identification and Representation of Lexical Collocations, volume~7 of Saarbrücken Dissertations in Computational Linguistics and Language Technology. DFKI \& Universität des Saarlandes, Saarbrücken, Germany.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.