Description Details Slots See Also

An n-gram is an ordered sequence of n "words" taken from a body of "text". The terms "words" and "text" can easily be interpreted literally, or with a more loose interpretation.

For example, consider the sequence "A B A C A B B". If we examine the 2-grams (or bigrams) of this sequence, they are

A B, B A, A C, C A, A B, B B

or without repetition:

A B, B A, A C, C A, B B

That is, we take the input string and group the "words" 2 at a time (because
`n=2`

). Notice that the number of n-grams and the number of words are
not obviously related; counting repetition, the number of n-grams is equal
to

`nwords - n + 1`

Bounds ignoring repetition are highly dependent on the input. A correct but useless bound is

`\#ngrams = nwords - (\#repeats - 1) - (n - 1)`

An `ngram`

object is an S4 class container that stores some basic
summary information (e.g., n), and several external pointers. For
information on how to construct an `ngram`

object, see
`ngram`

.

`str_ptr`

A pointer to a copy of the original input string.

`strlen`

The length of the string.

`n`

The eponymous 'n' as in 'n-gram'.

`ngl_ptr`

A pointer to the processed list of n-grams.

`ngsize`

The length of the ngram list, or in other words, the number of unique n-grams in the input string.

`sl_ptr`

A pointer to the list of words from the input string.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.