distribution: Type and Probability Distributions of LNRE Models (zipfR)
In zipfR: Statistical Models for Word Frequency Distributions

Description Usage Arguments Details Value See Also Examples

Type density g (tdlnre), type distribution G (tplnre), type quantiles G^{-1} (tqlnre), probability density f (dlnre), distribution function F (plnre), quantile function F^{-1} (qlnre), logarithmic type and probability densities (ltdlnre and ldlnre), and random sample generation (rlnre) for LNRE models.

  tdlnre(model, x, ...)
  tplnre(model, q, lower.tail=FALSE, ...)
  tqlnre(model, p, lower.tail=FALSE, ...)

  dlnre(model, x, ...)
  plnre(model, q, lower.tail=TRUE, ...)
  qlnre(model, p, lower.tail=TRUE, ...)

  ltdlnre(model, x, base=10, log.x=FALSE, ...)
  ldlnre(model, x, base=10, log.x=FALSE, ...)

  rlnre(model, n, what=c("tokens", "tfl"), ...)

`model`	an object belonging to a subclass of `lnre`, representing a LNRE model
`x`	vector of type probabilities pi for which the density function is evaluated
`q`	vector of type probability quantiles, i.e. threshold values ρ on the type probability axis
`p`	vector of tail probabilities
`lower.tail`	if `TRUE`, lower tail probabilities or type counts are returned / expected in the `p` argument. Note that the defaults differ for distribution function and type distribution, and see "Details" below.
`base`	positive number, the base with respect to which the log-transformation is peformed (see "Details" below)
`log.x`	if `TRUE`, the values passed in the argument `x` are assumed to be logarithmic, i.e. \log_a π
`n`	size of random sample to generate. If `length(n) > 1`, the length is taken to be the number required.
`what`	whether to return the sample as a vector of tokens or as a type-frequency list (usually more efficient)
`...`	further arguments are passed through to the method implementations (currently unused)

Note that the order in which arguments are specified differs from the analogous functions for common statistical distributions in the R standard library. In particular, the LNRE model model always has to be given as the first parameter so that R can dispatch the function call to an appropriate method implementation for the chosen LNRE model.

Some of the functions may not be available for certain types of LNRE models. In particular, no analytical solutions are known for the distribution and quantiles of GIGP models, so the functions tplnre, tqlnre, plnre, qlnre and rlnre (which depends on qlnre and tplnre) are not implemented for objects of class lnre.gigp.

The default tails differ for the distribution function (plnre, qlnre) and the type distribution (tplnre, tqlnre), in order to match the definitions of F(ρ) and G(ρ). While the distribution function defaults to lower tails (lower.tail=TRUE, corresponding to F and F^{-1}), the type distribution defaults to upper tails (lower.tail=FALSE, corresponding to G and G^{-1}).

Unlike for standard distriutions, logarithmic tail probabilities (log.p=TRUE) are not provided for the LNRE models, since here the focus is usually on the bulk of the distribution rather than on the extreme tails.

The log-transformed density functions f* and g* returned by ldlnre and ltdlnre, respectively, can be understood as probability and type densities for \log_a π instead of π, and are useful for visualization of LNRE populations (with a logarithmic scale for the parameter π on the x-axis). For example,

G(log_a rho) = integral_{log_a rho}^{0} g*(t) dt

For rnlre, either a factor of length n (what="tokens", the default) or a tfl object (what="tfl"), representing a random sample from the population described by the specified LNRE model. Note that the type-frequency list is a sufficient statistic, i.e. it provides all relevant information from the sample. For large n, type-frequency lists are generated more efficiently and with less memory overhead.

For all other functions, a vector of non-negative numbers of the same length as the second argument (x, p or q).

tdlnre returns the type density g(π) for the values of π specified in the vector x. tplnre returns the type distribution G(ρ) (default) or its complement 1-G(ρ) (if lower.tail=TRUE), for the values of ρ specified in the vector q. tqlnre returns type quantiles, i.e. the inverse G^{-1}(x) (default) or G^{-1}(S-x) (if lower.tail=TRUE) of the type distribution, for the type counts x specified in the vector p.

dlnre returns the probability density f(π) for the values of π specified in the vector x. plnre returns the distribution function F(ρ) (default) or its complement 1-F(ρ) (if lower.tail=FALSE), for the values of ρ specified in the vector q. qlnre returns quantiles, i.e. the inverse F^{-1}(p) (default) or F^{-1}(1-p) (if lower.tail=FALSE) of the distribution function, for the probabilities p specified in the vector p.

ldlnre and ltdlnre compute logarithmically transformed versions of the probability and type density functions, respectively, taking logarithms with respect to the base a specified in the base argument (default: a=10). See "Details" above for more information.

lnre for more information about LNRE models and how to initialize them.

Random samples generated with rnlre can be further processed with the functions vec2tfl, vec2spc and vec2vgc (for token vectors) and tfl2spc (for type-frequency lists).

## define ZM and fZM LNRE models 
ZM <- lnre("zm", alpha=.8, B=1e-3)
FZM <- lnre("fzm", alpha=.8, A=1e-5, B=.05)

## random samples from the two models
vec2tfl(rlnre(ZM, 10000))
vec2tfl(rlnre(FZM, 10000))
rlnre(FZM, 10000, what="tfl") # more efficient

## plot logarithmic type density functions
x <- 10^seq(-6, 1, by=.01)  # pi = 10^(-6) .. 10^(-1)
y.zm <- ltdlnre(ZM, x)
y.fzm <- ltdlnre(FZM, x)

plot(x, y.zm, type="l", lwd=2, col="red", log="x", ylim=c(0,14000))
lines(x, y.fzm, lwd=2, col="blue")
legend("topright", legend=c("ZM", "fZM"), lwd=3, col=c("red", "blue"))

## probability pi_k of k-th type according to FZM model
k <- 10
plnre(FZM, tqlnre(FZM, k-1)) - plnre(FZM, tqlnre(FZM, k))

## number of types with pi >= 1e-6
tplnre(ZM, 1e-6)

## lower tail fails for infinite population size
## Not run: 
tplnre(ZM, 1e-3, lower=TRUE)
## End(Not run)

## total probability mass assigned to types with pi <= 1e-6
plnre(ZM, 1e-6)