sample_sentences: Random Text Generation

Description Usage Arguments Details Value Author(s) Examples

View source: R/sample_sentences.R

Description

Sample sentences from a language model's probability distribution.

Usage

1
sample_sentences(model, n, max_length, t = 1)

Arguments

model

an object of class language_model.

n

an integer. Number of sentences to sample.

max_length

an integer. Maximum length of sampled sentences.

t

a positive number. Sampling temperature (optional); see Details.

Details

This function samples sentences according the prescribed language model's probability distribution, with an optional temperature parameter. The temperature transform of a probability distribution is defined by p(t) = exp(log(p) / t) / Z(t) where Z(t) is the partition function, fixed by the normalization condition sum(p(t)) = 1.

Sampling is performed word by word, using the already sampled string as context, starting from the Begin-Of-Sentence context (i.e. N - 1 BOS tokens). Sampling stops either when an End-Of-Sentence token is encountered, or when the string exceeds max_length, in which case a truncated output is returned.

A word of caution on some special smoothers: 'sbo' smoother (Stupid Backoff), does not produce normalized continuation probabilities, but rather continuation scores. Sampling is here performed by assuming that Stupid Backoff scores are proportional to actual probabilities. 'ml' smoother (Maximum Likelihood) does not assign probabilities when the k-gram count of the context is zero. When this happens, the next word is chosen uniformly at random from the model's dictionary.

Value

a character vector of length n. Random sentences generated from the language model's distribution.

Author(s)

Valerio Gherardi

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Sample sentences from 8-gram Kneser-Ney model trained on Shakespeare's
# "Much Ado About Nothing"



### Prepare the model and set seed
freqs <- kgram_freqs(much_ado, 8, .tknz_sent = tknz_sent)
model <- language_model(freqs, "kn", D = 0.75)
set.seed(840)

sample_sentences(model, n = 3, max_length = 10)

### Sampling at high temperature
sample_sentences(model, n = 3, max_length = 10, t = 100)

### Sampling at low temperature
sample_sentences(model, n = 3, max_length = 10, t = 0.01)

kgrams documentation built on Nov. 16, 2021, 9:22 a.m.