Description Usage Arguments Details Value See Also Examples
Compute typefrequency list, frequency spectrum and vocabulary growth curve from a token vector representing a random sample or an observed sequence of tokens.
x 
a vector of length N_0, representing a random sample or
other observed data set of N_0 tokens. For each token, the
corresponding element of 
steps 
number of steps for which vocabulary growth data V(N) is calculated. The values of N will be evenly spaced (up to rounding differences) from N=1 to N=N_0. 
stepsize 
alternative way of specifying the steps of the
vocabulary growth curve. In this case, vocabulary growth data will
be calculated every 
m.max 
an integer in the range $1 ... 9$, specifying how many
spectrum elements V_m(N) to include in the vocabulary growth
curve. By default only vocabulary size V(N) is calculated,
i.e. 
There are two main applications for the vec2xxx
functions:
They can be used to calculate typetoken statistics and
vocabulary growth curves for random samples generated from a LNRE
model (with the rlnre
function).
They provide an easy way to process a user's own data without having to rely on external scripts to compute frequency spectra and vocabulary growth curves. All that is needed is a text file in onetokenperline formt (i.e. where each token is given on a separate line). See "Examples" below for further hints.
Both applications work well for samples of up to approx. 1 million
tokens. For considerably larger data sets, specialized external
software should be used, such as the Perl scripts provided on the
zipfR
homepage.
An object of class tfl
, spc
or vgc
, representing
the type frequency list, frequency spectrum or vocabulary growth curve
of the token vector x
, respectively.
tfl
, spc
and vgc
for more
information about type frequency lists, frequency spectra and
vocabulary growth curves
rlnre
for generating random samples (in the form of the
required token vectors) from a LNRE model
readLines
and scan
for loading token
vectors from disk files
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  ## typetoken statistics for random samples from a LNRE distribution
model < lnre("fzm", alpha=.5, A=1e6, B=.05)
x < rlnre(model, 100000)
vec2tfl(x)
vec2spc(x) # same as tfl2spc(vec2tfl(x))
vec2vgc(x)
sample.spc < vec2spc(x)
exp.spc < lnre.spc(model, 100000)
## Not run: plot(exp.spc, sample.spc)
sample.vgc < vec2vgc(x, m.max=1, steps=500)
exp.vgc < lnre.vgc(model, N=N(sample.vgc), m.max=1)
## Not run: plot(exp.vgc, sample.vgc, add.m=1)
## load token vector from a file in onetokenperline format
## Not run: x < readLines(filename)
## Not run: x < readLines(file.choose()) # with file selection dialog
## you can also perform whitespace tokenization and filter the data
## Not run: brown < scan("brown.pos", what=character(0), quote="")
## Not run: nouns < grep("/NNS?$", brown, value=TRUE)
## Not run: plot(vec2spc(nouns))
## Not run: plot(vec2vgc(nouns, m.max=1), add.m=1)

