View source: R/sample_sentences.R
sample_sentences | R Documentation |
Sample sentences from a language model's probability distribution.
sample_sentences(model, n, max_length, t = 1)
model |
an object of class |
n |
an integer. Number of sentences to sample. |
max_length |
an integer. Maximum length of sampled sentences. |
t |
a positive number. Sampling temperature (optional); see Details. |
This function samples sentences according the prescribed language model's
probability distribution, with an optional temperature parameter.
The temperature transform of a probability distribution is defined by
p(t) = exp(log(p) / t) / Z(t)
where Z(t)
is the partition
function, fixed by the normalization condition sum(p(t)) = 1
.
Sampling is performed word by word, using the already sampled string
as context, starting from the Begin-Of-Sentence context (i.e. N - 1
BOS tokens). Sampling stops either when an End-Of-Sentence token is
encountered, or when the string exceeds max_length
, in which case
a truncated output is returned.
Some language models may give a non-zero probability to the the Unknown word
token, but this is never produced in text generated by
sample_sentences()
: when randomly sampled, it is simply ignored.
Finally, a word of caution on some special smoothers: "sbo"
smoother
(Stupid Backoff), does not produce normalized continuation probabilities,
but rather continuation scores. Sampling is here performed by assuming
that Stupid Backoff scores are proportional to actual probabilities.
'ml' smoother (Maximum Likelihood) does not assign probabilities when the
k-gram count of the context is zero. When this happens, the next word is
chosen uniformly at random from the model's dictionary.
a character vector of length n
. Random sentences generated
from the language model's distribution.
Valerio Gherardi
# Sample sentences from 8-gram Kneser-Ney model trained on Shakespeare's
# "Much Ado About Nothing"
### Prepare the model and set seed
freqs <- kgram_freqs(much_ado, 8, .tknz_sent = tknz_sent)
model <- language_model(freqs, "kn", D = 0.75)
set.seed(840)
sample_sentences(model, n = 3, max_length = 10)
### Sampling at high temperature
sample_sentences(model, n = 3, max_length = 10, t = 100)
### Sampling at low temperature
sample_sentences(model, n = 3, max_length = 10, t = 0.01)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.