word_coverage: Word coverage fraction
In sbo: Text Prediction via Stupid Back-Off N-Gram Models

Description Usage Arguments Details Value Author(s) See Also Examples

Compute total and cumulative corpus coverage fraction of a dictionary.

word_coverage(object, corpus, ...)

## S3 method for class 'sbo_dictionary'
word_coverage(object, corpus, ...)

## S3 method for class 'character'
word_coverage(object, corpus, .preprocess = identity, EOS = "", ...)

## S3 method for class 'sbo_kgram_freqs'
word_coverage(object, corpus, ...)

## S3 method for class 'sbo_predictions'
word_coverage(object, corpus, ...)

`object`	either a character vector, or an object inheriting from one of the classes `sbo_dictionary`, `sbo_kgram_freqs`, `sbo_predtable` or `sbo_predictor`. The object storing the dictionary for which corpus coverage is to be computed.
`corpus`	a character vector.
`...`	further arguments passed to or from other methods.
`.preprocess`	preprocessing function for training corpus. See `kgram_freqs` and `sbo_dictionary` for further details.
`EOS`	a length one character vector. String containing End-Of-Sentence characters, see `kgram_freqs` and `sbo_dictionary` for further details.

This function computes the corpus coverage fraction of a dictionary, that is the fraction of words appearing in corpus which are contained in the original dictionary.

This function is a generic, accepting as object argument any object storing a dictionary, along with a preprocessing function and a list of End-Of-Sentence characters. This includes all sbo main classes: sbo_dictionary, sbo_kgram_freqs, sbo_predtable and sbo_predictor. When object is a character vector, the preprocessing function and the End-Of-Sentence characters must be specified explicitly.

The coverage fraction is computed cumulatively, and the dependence of coverage with respect to maximal rank can be explored through plot() (see examples below)

a word_coverage object.

Valerio Gherardi

predict.sbo_predictor

c <- word_coverage(twitter_dict, twitter_train)
print(c)
summary(c)
# Plot coverage fraction, including the End-Of-Sentence in word counts.
plot(c, include_EOS = TRUE)