Description Usage Arguments Details Value See Also Examples
Compute type-frequency list, frequency spectrum and vocabulary growth curve from a token vector representing a random sample or an observed sequence of tokens.
1 2 3 4 5 |
x |
a vector of length N_0, representing a random sample or
other observed data set of N_0 tokens. For each token, the
corresponding element of |
steps |
number of steps for which vocabulary growth data V(N) is calculated. The values of N will be evenly spaced (up to rounding differences) from N=1 to N=N_0. |
stepsize |
alternative way of specifying the steps of the
vocabulary growth curve. In this case, vocabulary growth data will
be calculated every |
m.max |
an integer in the range $1 ... 9$, specifying how many
spectrum elements V_m(N) to include in the vocabulary growth
curve. By default only vocabulary size V(N) is calculated,
i.e. |
There are two main applications for the vec2xxx
functions:
They can be used to calculate type-token statistics and
vocabulary growth curves for random samples generated from a LNRE
model (with the rlnre
function).
They provide an easy way to process a user's own data without having to rely on external scripts to compute frequency spectra and vocabulary growth curves. All that is needed is a text file in one-token-per-line formt (i.e. where each token is given on a separate line). See "Examples" below for further hints.
Both applications work well for samples of up to approx. 1 million
tokens. For considerably larger data sets, specialized external
software should be used, such as the Perl scripts provided on the
zipfR
homepage.
An object of class tfl
, spc
or vgc
, representing
the type frequency list, frequency spectrum or vocabulary growth curve
of the token vector x
, respectively.
tfl
, spc
and vgc
for more
information about type frequency lists, frequency spectra and
vocabulary growth curves
rlnre
for generating random samples (in the form of the
required token vectors) from a LNRE model
readLines
and scan
for loading token
vectors from disk files
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ## type-token statistics for random samples from a LNRE distribution
model <- lnre("fzm", alpha=.5, A=1e-6, B=.05)
x <- rlnre(model, 100000)
vec2tfl(x)
vec2spc(x) # same as tfl2spc(vec2tfl(x))
vec2vgc(x)
sample.spc <- vec2spc(x)
exp.spc <- lnre.spc(model, 100000)
plot(exp.spc, sample.spc)
sample.vgc <- vec2vgc(x, m.max=1, steps=500)
exp.vgc <- lnre.vgc(model, N=N(sample.vgc), m.max=1)
plot(exp.vgc, sample.vgc, add.m=1)
## Not run:
## load token vector from a file in one-token-per-line format
x <- readLines(filename)
x <- readLines(file.choose()) # with file selection dialog
## you can also perform whitespace tokenization and filter the data
brown <- scan("brown.pos", what=character(0), quote="")
nouns <- grep("/NNS?$", brown, value=TRUE)
plot(vec2spc(nouns))
plot(vec2vgc(nouns, m.max=1), add.m=1)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.