zipftolint: Zipf-Mandelbrot Tolerance Intervals

zipftol.intR Documentation

Zipf-Mandelbrot Tolerance Intervals

Description

Provides 1-sided or 2-sided tolerance intervals for data distributed according to Zipf, Zipf-Mandelbrot, and zeta distributions.

Usage

zipftol.int(x, m = NULL, N = NULL, alpha = 0.05, P = 0.99, 
            side = 1, s = 1, b = 1, dist = c("Zipf", 
            "Zipf-Man", "Zeta"), ties = FALSE, ...) 

Arguments

x

A vector of raw data or a table of counts which is distributed according to a Zipf, Zipf-Mandelbrot, or zeta distribution. Do not supply a vector of counts!

m

The number of observations in a future sample for which the tolerance limits will be calculated. By default, m = NULL and, thus, m will be set equal to the original sample size.

N

The number of categories when dist = "Zipf" or dist = "Zipf-Man". This is not used when dist = "Zeta". If N = NULL, then N is estimated based on the number of categories observed in the data.

alpha

The level chosen such that 1-alpha is the confidence level.

P

The proportion of the population to be covered by this tolerance interval.

side

Whether a 1-sided or 2-sided tolerance interval is required (determined by side = 1 or side = 2, respectively).

s

The initial value to estimate the shape parameter in the zm.ll function.

b

The initial value to estimate the second shape parameter in the zm.ll function when dist = "Zipf-Man".

dist

Options are dist = "Zipf", dist = "Zipf-Man", or dist = "Zeta" if the data is distributed according to the Zipf, Zipf-Mandelbrot, or zeta distribution, respectively.

ties

How to handle if there are other categories with the same frequency as the category at the estimated tolerance limit. The default is FALSE, which does no correction. If TRUE, then the highest ranked (i.e., lowest number) of the tied categories is selected for the lower limit and the lowest ranked (i.e., highest number) of the tied categories is selected for the upper limit.

...

Additional arguments passed to the zm.ll function, which is used for maximum likelihood estimation.

Details

Zipf-Mandelbrot models are commonly used to model phenomena where the frequencies of categorical data are approximately inversely proportional to its rank in the frequency table. Zipf-Mandelbrot distributions are heavily right-skewed distributions with a (relatively) large mass placed on the first category. For most practical applications, one will typically be interested in 1-sided upper bounds.

Value

zipftol.int returns a data frame with the following items:

alpha

The specified significance level.

P

The proportion of the population covered by this tolerance interval.

s.hat

MLE for the shape parameter s.

b.hat

MLE for the shape parameter b when dist = "Zipf-Man".

1-sided.lower

The 1-sided lower tolerance bound. This is given only if side = 1.

1-sided.upper

The 1-sided upper tolerance bound. This is given only if side = 1.

2-sided.lower

The 2-sided lower tolerance bound. This is given only if side = 2.

2-sided.upper

The 2-sided upper tolerance bound. This is given only if side = 2.

Note

This function may be updated in a future version of the package so as to allow greater flexibility with the inputs.

References

Mandelbrot, B. B. (1965), Information Theory and Psycholinguistics. In B. B. Wolman and E. Nagel, editors. Scientific Psychology, Basic Books.

Young, D. S. (2013), Approximate Tolerance Limits for Zipf-Mandelbrot Distributions, Physica A: Statistical Mechanics and its Applications, 392, 1702–1711.

Zipf, G. K. (1949), Human Behavior and the Principle of Least Effort, Hafner.

Zornig, P. and Altmann, G. (1995), Unified Representation of Zipf Distributions, Computational Statistics and Data Analysis, 19, 461–473.

See Also

ZipfMandelbrot, zm.ll

Examples

## 95%/99% 1-sided tolerance intervals for the Zipf, 
## Zipf-Mandelbrot, and zeta distributions. 

set.seed(100)

s <- 2
b <- 5
N <- 50

zipf.data <- rzipfman(n = 150, s = s, N = N)
zipfman.data <- rzipfman(n = 150, s = s, b = b, N = N)
zeta.data <- rzipfman(n = 150, s = s, N = Inf)

out.zipf <- zipftol.int(zipf.data, dist = "Zipf")
out.zipfman <- zipftol.int(zipfman.data, dist = "Zipf-Man")
out.zeta <- zipftol.int(zeta.data, N = Inf, dist = "Zeta")

out.zipf
out.zipfman
out.zeta

tolerance documentation built on May 29, 2024, 7:38 a.m.