Comparisons of lexical richness between two texts are carried out on the basis of the vocabulary size (number of types) and on the basis of the vocabulary growth rate. Variances of the number of types and of the number of hapax legomena required for the tests are estimated with the help of LNRE models.
compare.richness.fnc(text1, text2, digits = 5)
First text in the comparison.
Second text in the comparison.
Number of decimal digits required for the growth rate.
The comparison for the vocabulary size is carried out with the test statistic
Z = (E[V_1] - E[V_2])/sqrt(VAR[V_1] + VAR[V_2])
and the comparison of the growth rates with the test statistic
Z = (E[V_1(1)]/N_1 - E[V_2(1)]/N_2)/sqrt(VAR[V_1(1)]/N_1^2 + VAR[V_2(1)]/N_2^2)
where N denotes the sample size in tokens, V the vocabulary size, and V(1) the number of hapax legomena.
A summary listing the Chi-Squared measure of goodness of fit for the LNRE models (available in the zipfR package) used to estimate variances, a table listing tokens, types, hapax legomena and the vocabulary growth rate, and two-tailed tests for differences in the vocabulary sizes and growth rates with Z-score and p-value.
It is probably unwise to attempt to apply this function to texts comprising more than 500,000 words.
R. Harald Baayen Radboud University Nijmegen and Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands. email@example.com
Baayen, R. H. (2001) Word Frequency Distributions, Kluwer Academic Publishers, Dordrecht.
1 2 3 4 5 6
comparison of lexical richness for tolower(alice) and tolower(through[1:length(alice)]) with approximations of variances based on the LNRE models gigp (X2 = 61.72) and gigp (X2 = 53.29) Tokens Types HapaxLegomena GrowthRate tolower(alice) 27269 2615 1166 0.04276 tolower(through[1:length(alice)]) 27269 2727 1208 0.04430 two-tailed tests: Z p Vocabulary Size -2.7041 0.0068 Vocabulary Growth Rate -1.0113 0.3119 comparison of lexical richness for tolower(alice) and tolower(oz[1:25942]) with approximations of variances based on the LNRE models gigp (X2 = 61.72) and gigp (X2 = 58.8) Tokens Types HapaxLegomena GrowthRate tolower(alice) 27269 2615 1166 0.04276 tolower(oz[1:25942]) 25942 2333 1005 0.03874 two-tailed tests: Z p Vocabulary Size 7.0938 0.0000 Vocabulary Growth Rate 2.6850 0.0073
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.