| ShakeWords | R Documentation |
This data set, from Efron and Thisted (1976), gives the number of distinct
words types (Freq) of words that appeared exactly once, twice, etc.
up to 100 times (count) in the complete works of Shakespeare. In
these works, Shakespeare used 31,534 distinct words (types), comprising
884,647 words in total.
A data frame with 100 observations on the following 2 variables.
countthe number of times a word type appeared in Shakespeare's written works
Freqthe number of different words (types) appearing with this count.
Efron & Thisted used this data to ask the question, "How many words did Shakespeare know?" Put another way, suppose another new corpus of works Shakespeare were discovered, also with 884,647 words. How many new word types would appear? The answer to the main question involves contemplating an infinite number of such new corpora.
In addition to the words that appear 1:100 times, there are 846 words
that appear more than 100 times, not listed in this data set.
Bradley Efron and Ronald Thisted (1976). Estimating the Number of Unseen Species: How Many Words Did Shakespeare Know? Biometrika, Vol. 63, No. 3, pp. 435-447, %http://www.jstor.org/stable/2335721
data(ShakeWords)
str(ShakeWords)
plot(sqrt(Freq) ~ count, data=ShakeWords)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.