Zipf_n_Heaps: Explore Corpus Term Frequency Characteristics
In tm: Text Mining Package

Description Usage Arguments Details Value Examples

Explore Zipf's law and Heaps' law, two empirical laws in linguistics describing commonly observed characteristics of term frequency distributions in corpora.

1 2	Zipf_plot(x, type = "l", ...) Heaps_plot(x, type = "l", ...)

`x`	a document-term matrix or term-document matrix with unweighted term frequencies.
`type`	a character string indicating the type of plot to be drawn, see `plot`.
`...`	further graphical parameters to be used for plotting.

Zipf's law (e.g., https://en.wikipedia.org/wiki/Zipf%27s_law) states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table, or, more generally, that the pmf of the term frequencies is of the form c k^{-β}, where k is the rank of the term (taken from the most to the least frequent one). We can conveniently explore the degree to which the law holds by plotting the logarithm of the frequency against the logarithm of the rank, and inspecting the goodness of fit of a linear model.

Heaps' law (e.g., https://en.wikipedia.org/wiki/Heaps%27_law) states that the vocabulary size V (i.e., the number of different terms employed) grows polynomially with the text size T (the total number of terms in the texts), so that V = c T^β. We can conveniently explore the degree to which the law holds by plotting \log(V) against \log(T), and inspecting the goodness of fit of a linear model.

The coefficients of the fitted linear model. As a side effect, the corresponding plot is produced.

data("acq")
m <- DocumentTermMatrix(acq)
Zipf_plot(m)
Heaps_plot(m)

Loading required package: NLP
(Intercept)           x 
  5.5934763  -0.7707745 
(Intercept)           x 
  0.9186489   0.7758549

tm documentation built on April 7, 2021, 3:01 a.m.

tm index

Extensions Introduction to the tm Package

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

tm
Text Mining Package

Zipf_n_Heaps: Explore Corpus Term Frequency Characteristics
In tm: Text Mining Package

Description

Usage

Arguments

Details

Value

Examples

Example output

Related to Zipf_n_Heaps in tm...

R Package Documentation

Browse R Packages

We want your feedback!

tm Text Mining Package

Zipf_n_Heaps: Explore Corpus Term Frequency Characteristics In tm: Text Mining Package

Description

Usage

Arguments

Details

Value

Examples

Example output

Related to Zipf_n_Heaps in tm...

R Package Documentation

Browse R Packages

We want your feedback!

tm
Text Mining Package

Zipf_n_Heaps: Explore Corpus Term Frequency Characteristics
In tm: Text Mining Package