word_coverage: Word coverage fraction

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/word_coverage.R

Description

Compute total and cumulative corpus coverage fraction of a dictionary.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
word_coverage(object, corpus, ...)

## S3 method for class 'sbo_dictionary'
word_coverage(object, corpus, ...)

## S3 method for class 'character'
word_coverage(object, corpus, .preprocess = identity, EOS = "", ...)

## S3 method for class 'sbo_kgram_freqs'
word_coverage(object, corpus, ...)

## S3 method for class 'sbo_predictions'
word_coverage(object, corpus, ...)

Arguments

object

either a character vector, or an object inheriting from one of the classes sbo_dictionary, sbo_kgram_freqs, sbo_predtable or sbo_predictor. The object storing the dictionary for which corpus coverage is to be computed.

corpus

a character vector.

...

further arguments passed to or from other methods.

.preprocess

preprocessing function for training corpus. See kgram_freqs and sbo_dictionary for further details.

EOS

a length one character vector. String containing End-Of-Sentence characters, see kgram_freqs and sbo_dictionary for further details.

Details

This function computes the corpus coverage fraction of a dictionary, that is the fraction of words appearing in corpus which are contained in the original dictionary.

This function is a generic, accepting as object argument any object storing a dictionary, along with a preprocessing function and a list of End-Of-Sentence characters. This includes all sbo main classes: sbo_dictionary, sbo_kgram_freqs, sbo_predtable and sbo_predictor. When object is a character vector, the preprocessing function and the End-Of-Sentence characters must be specified explicitly.

The coverage fraction is computed cumulatively, and the dependence of coverage with respect to maximal rank can be explored through plot() (see examples below)

Value

a word_coverage object.

Author(s)

Valerio Gherardi

See Also

predict.sbo_predictor

Examples

1
2
3
4
5
c <- word_coverage(twitter_dict, twitter_train)
print(c)
summary(c)
# Plot coverage fraction, including the End-Of-Sentence in word counts.
plot(c, include_EOS = TRUE)

sbo documentation built on Dec. 6, 2020, 1:06 a.m.