README.md

Build Status DOI

CollocateR

CollocateR is a package for the statistical programming language R. Albeit imperfectly, the package increasingly uses functions and workflows from the tidyverse and tidytext packages.

Purpose

CollocateR serves a simple purpose. It processes collocates for keywords in context in text files and calculates significance for them, based on tests set out in Barnbrook et al's Collocation: Applications and Implications, Palgrave 2013, and formulae explained in the British National Corpus home.

Functions

~~- save_collocates: Return a list containing a tokenised version of the original document, a record of the node in original and hashed format, lists of left and right collocate locations, and document word_length.~~ - get_freqs: A frequency count for collocates, both in context and in the document in general - pmi: a 'pointwise mutual information' significance test based on the probability of nodes and collocates occurring together compared to the probability of their occurring independently. - npmi: as above, but normalised so all results occur between 1 (perfect collocation) and -1 (the terms never collocate). - z-score: a probability test comparing probability of collocate occurring in near the node versus its occurrence across the text

TODO

Acknowledgement

README generated with readme2tex.



cokelly/collocateR documentation built on May 13, 2019, 8:49 p.m.