Compute type-frequency list, frequency spectrum and vocabulary growth curve from a token vector representing a random sample or an observed sequence of tokens.
1 2 3 4 5
a vector of length N_0, representing a random sample or
other observed data set of N_0 tokens. For each token, the
corresponding element of
number of steps for which vocabulary growth data V(N) is calculated. The values of N will be evenly spaced (up to rounding differences) from N=1 to N=N_0.
alternative way of specifying the steps of the
vocabulary growth curve. In this case, vocabulary growth data will
be calculated every
an integer in the range $1 ... 9$, specifying how many
spectrum elements V_m(N) to include in the vocabulary growth
curve. By default only vocabulary size V(N) is calculated,
There are two main applications for the
They can be used to calculate type-token statistics and
vocabulary growth curves for random samples generated from a LNRE
model (with the
They provide an easy way to process a user's own data without having to rely on external scripts to compute frequency spectra and vocabulary growth curves. All that is needed is a text file in one-token-per-line formt (i.e. where each token is given on a separate line). See "Examples" below for further hints.
Both applications work well for samples of up to approx. 1 million
tokens. For considerably larger data sets, specialized external
software should be used, such as the Perl scripts provided on the
An object of class
the type frequency list, frequency spectrum or vocabulary growth curve
of the token vector
rlnre for generating random samples (in the form of the
required token vectors) from a LNRE model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## type-token statistics for random samples from a LNRE distribution model <- lnre("fzm", alpha=.5, A=1e-6, B=.05) x <- rlnre(model, 100000) vec2tfl(x) vec2spc(x) # same as tfl2spc(vec2tfl(x)) vec2vgc(x) sample.spc <- vec2spc(x) exp.spc <- lnre.spc(model, 100000) ## Not run: plot(exp.spc, sample.spc) sample.vgc <- vec2vgc(x, m.max=1, steps=500) exp.vgc <- lnre.vgc(model, N=N(sample.vgc), m.max=1) ## Not run: plot(exp.vgc, sample.vgc, add.m=1) ## load token vector from a file in one-token-per-line format ## Not run: x <- readLines(filename) ## Not run: x <- readLines(file.choose()) # with file selection dialog ## you can also perform whitespace tokenization and filter the data ## Not run: brown <- scan("brown.pos", what=character(0), quote="") ## Not run: nouns <- grep("/NNS?$", brown, value=TRUE) ## Not run: plot(vec2spc(nouns)) ## Not run: plot(vec2vgc(nouns, m.max=1), add.m=1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.