Expected Frequency Spectrum by Binomial Interpolation (zipfR)

Share:

Description

spc.interp computes the expected frequency spectrum for a random sample of specified size N, taken from a data set described by the frequency spectrum object obj.

Usage

1
  spc.interp(obj, N, m.max=max(obj$m), allow.extrapolation=FALSE)

Arguments

obj

an object of class spc, representing the frequency spectrum of the data set from which samples are taken

N

a single non-negative integer specifying the sample size for which the expected frequency spectrum is calculated

m.max

number of spectrum elements listed in the expected frequency spectrum. By default, as many spectrum elements are included as the spectrum obj contains, since the expectations of higher spectrum elements will always be 0 in the binomial interpolation. See note in section "Details" below.

allow.extrapolation

if TRUE, the requested sample size N may be larger than the sample size of the frequency spectrum obj, for binomial extrapolation. This obtion should be used with great caution (see EVm.spc for details).

Details

See the EVm.spc manpage for more information, especially concerning binomial extrapolation.

For large frequency spectra, the default value of m.max may lead to very long computation times. It is therefore recommended to specify m.max explicitly and calculate only as many spectrum elements as are actually required.

Value

An object of class spc, representing the expected frequency spectrum for a random sample of size N taken from the data set that is described by obj.

See Also

spc for more information about frequency spectra and links to relevant functions

The implementation of spc.interp is based on the functions EV.spc and EVm.spc. See the respective manpages for technical details.

vgc.interp computes expected vocabulary growth curves by binomial interpolation from a frequency spectrum

sample.spc takes a single concrete random subsample from a spectrum and returns the spectrum of the subsample, unlike spc.interp, that computes the expected frequency spectrum for random subsamples of size N by binomial interpolation.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## load the Tiger NP expansion spectrum
## (sample size: about 109k tokens) 
data(TigerNP.spc)

## interpolated expected frequency subspectrum of 50k tokens
TigerNP.sub.spc <- spc.interp(TigerNP.spc,5e+4)
summary(TigerNP.sub.spc)

## previous is slow since it calculates all expected  spectrum
## elements; suppose we only need the first 10 expected
## spectrum element frequencies; then we can do:
TigerNP.sub.spc <- spc.interp(TigerNP.spc,5e+4,m.max=10) # much faster!
summary(TigerNP.sub.spc)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.