Description Usage Arguments Details Value References Examples
This function examines whether the distribution of word frequencies in a text document follows the Zipf distribution (Zipf 1934). The Zipf's distribution is considered the ideal distribution of a perfect natural language text.
1 | word_distrib(textdoc)
|
textdoc |
|
The Zipf's distribution is most easily observed by plotting the data on a log-log graph, with the axes being log(word rank order) and log(word frequency). For a perfect natural language text, the relationship between the word rank and the word frequency should have a negative slope with all points falling on a straight line. Any deviation from the straight line can be considered an imperfection attributable to the texts within the document.
A list of word ranks and their respective frequencies, and a plot showing the relationship between the two variables.
Zipf G (1936). The Psychobiology of Language. London: Routledge; 1936.
1 2 3 4 5 | #Get an \code{n} x 1 text document
tweets_dat <- data.frame(text=tweets[,1])
plt = word_distrib(textdoc = tweets_dat)
plt
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.