tknz_sent()
and preprocess()
now have a different implementation on
Windows and UNIX OSs, respectively (since the previous C++ implementation has
impredictable behaviour on Windows, see #30). This fix also included minor
changes in the tknz_sent()
output, in some corner cases (e.g. tknz_sent("")
now returns character(0)
, wheareas it used to return ""
).perplexity()
gets a new argument exp
that allows to return the
cross-entropy per word, rather than perplexity (its exponential).perplexity.character()
gets a new argument detailed
that allows to return, alongside with the total perplexity of the input document, also the
cross-entropies and word lengths of individual sentences. Closes #28.?kgram_freqs
.R
requirements 3.5 -> 4.0
.SystemRequirements: C++11
(see this tidyverse blog post)verbose
arguments now default to FALSE
.probability()
, perplexity()
and sample_sentences()
are restricted to
accept only language_model
class objects as their model
argument.as_dictionary(NULL)
now returns an empty dictionary
..preprocess
and .tknz_sent
arguments to be ignored in process_sentences()
.max_lines
and batch_size
arguments in kgram_freqs.connection()
.dictionary
.dictionary()
with batch processing and
non-trivial size constraints on vocabulary size.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.