View source: R/sample_sentences.R

sample_sentences | R Documentation |

Sample sentences from a language model's probability distribution.

```
sample_sentences(model, n, max_length, t = 1)
```

`model` |
an object of class |

`n` |
an integer. Number of sentences to sample. |

`max_length` |
an integer. Maximum length of sampled sentences. |

`t` |
a positive number. Sampling temperature (optional); see Details. |

This function samples sentences according the prescribed language model's
probability distribution, with an optional temperature parameter.
The temperature transform of a probability distribution is defined by
`p(t) = exp(log(p) / t) / Z(t)`

where `Z(t)`

is the partition
function, fixed by the normalization condition `sum(p(t)) = 1`

.

Sampling is performed word by word, using the already sampled string
as context, starting from the Begin-Of-Sentence context (i.e. `N - 1`

BOS tokens). Sampling stops either when an End-Of-Sentence token is
encountered, or when the string exceeds `max_length`

, in which case
a truncated output is returned.

Some language models may give a non-zero probability to the the Unknown word
token, but this is never produced in text generated by
`sample_sentences()`

: when randomly sampled, it is simply ignored.

Finally, a word of caution on some special smoothers: `"sbo"`

smoother
(Stupid Backoff), does not produce normalized continuation probabilities,
but rather continuation *scores*. Sampling is here performed by assuming
that Stupid Backoff scores are *proportional* to actual probabilities.
'ml' smoother (Maximum Likelihood) does not assign probabilities when the
k-gram count of the context is zero. When this happens, the next word is
chosen uniformly at random from the model's dictionary.

a character vector of length `n`

. Random sentences generated
from the language model's distribution.

Valerio Gherardi

```
# Sample sentences from 8-gram Kneser-Ney model trained on Shakespeare's
# "Much Ado About Nothing"
### Prepare the model and set seed
freqs <- kgram_freqs(much_ado, 8, .tknz_sent = tknz_sent)
model <- language_model(freqs, "kn", D = 0.75)
set.seed(840)
sample_sentences(model, n = 3, max_length = 10)
### Sampling at high temperature
sample_sentences(model, n = 3, max_length = 10, t = 100)
### Sampling at low temperature
sample_sentences(model, n = 3, max_length = 10, t = 0.01)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.