Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/word_coverage.R
Compute total and cumulative corpus coverage fraction of a dictionary.
1 2 3 4 5 6 7 8 9 10 11 12 13 | word_coverage(object, corpus, ...)
## S3 method for class 'sbo_dictionary'
word_coverage(object, corpus, ...)
## S3 method for class 'character'
word_coverage(object, corpus, .preprocess = identity, EOS = "", ...)
## S3 method for class 'sbo_kgram_freqs'
word_coverage(object, corpus, ...)
## S3 method for class 'sbo_predictions'
word_coverage(object, corpus, ...)
|
object |
either a character vector, or an object inheriting from one of
the classes |
corpus |
a character vector. |
... |
further arguments passed to or from other methods. |
.preprocess |
preprocessing function for training corpus. See
|
EOS |
a length one character vector. String containing End-Of-Sentence
characters, see |
This function computes the corpus coverage fraction of a dictionary, that is the fraction of words appearing in corpus which are contained in the original dictionary.
This function is a generic, accepting as object
argument any object
storing a dictionary, along with a preprocessing function and a list
of End-Of-Sentence characters. This includes all sbo
main classes:
sbo_dictionary
, sbo_kgram_freqs
, sbo_predtable
and
sbo_predictor
. When object
is a character vector, the preprocessing
function and the End-Of-Sentence characters must be specified explicitly.
The coverage fraction is computed cumulatively, and the dependence of
coverage with respect to maximal rank can be explored through plot()
(see examples below)
a word_coverage
object.
Valerio Gherardi
1 2 3 4 5 | c <- word_coverage(twitter_dict, twitter_train)
print(c)
summary(c)
# Plot coverage fraction, including the End-Of-Sentence in word counts.
plot(c, include_EOS = TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.