spc_interp: Expected Frequency Spectrum by Binomial Interpolation (zipfR)
In zipfR: Statistical Models for Word Frequency Distributions

Description Usage Arguments Details Value See Also Examples

spc.interp computes the expected frequency spectrum for a random sample of specified size N, taken from a data set described by the frequency spectrum object obj.

1	spc.interp(obj, N, m.max=max(obj$m), allow.extrapolation=FALSE)

`obj`	an object of class `spc`, representing the frequency spectrum of the data set from which samples are taken
`N`	a single non-negative integer specifying the sample size for which the expected frequency spectrum is calculated
`m.max`	number of spectrum elements listed in the expected frequency spectrum. By default, as many spectrum elements are included as the spectrum `obj` contains, since the expectations of higher spectrum elements will always be 0 in the binomial interpolation. See note in section "Details" below.
`allow.extrapolation`	if `TRUE`, the requested sample size N may be larger than the sample size of the frequency spectrum `obj`, for binomial extrapolation. This obtion should be used with great caution (see `EVm.spc` for details).

See the EVm.spc manpage for more information, especially concerning binomial extrapolation.

For large frequency spectra, the default value of m.max may lead to very long computation times. It is therefore recommended to specify m.max explicitly and calculate only as many spectrum elements as are actually required.

An object of class spc, representing the expected frequency spectrum for a random sample of size N taken from the data set that is described by obj.

spc for more information about frequency spectra and links to relevant functions

The implementation of spc.interp is based on the functions EV.spc and EVm.spc. See the respective manpages for technical details.

vgc.interp computes expected vocabulary growth curves by binomial interpolation from a frequency spectrum

sample.spc takes a single concrete random subsample from a spectrum and returns the spectrum of the subsample, unlike spc.interp, that computes the expected frequency spectrum for random subsamples of size N by binomial interpolation.

## load the Tiger NP expansion spectrum
## (sample size: about 109k tokens) 
data(TigerNP.spc)

## interpolated expected frequency subspectrum of 50k tokens
TigerNP.sub.spc <- spc.interp(TigerNP.spc,5e+4)
summary(TigerNP.sub.spc)

## previous is slow since it calculates all expected  spectrum
## elements; suppose we only need the first 10 expected
## spectrum element frequencies; then we can do:
TigerNP.sub.spc <- spc.interp(TigerNP.spc,5e+4,m.max=10) # much faster!
summary(TigerNP.sub.spc)