ngrams: N-grams and their frequencies.
In soundcorrs: Semi-Automatic Analysis of Sound Correspondences

Description Usage Arguments Details Value Examples

Find n-grams of specified length and return them as a list, or their counts as a table.

1	ngrams(x, n = 1, borders = c("", ""), rm = "", as.table = T)

`x`	[character vector] Words to be cut into n-grams.
`n`	[integer] The length of n-grams to look for. Defaults to `1`.
`borders`	[character] Characters to prepend and append to every word. Must be a vector of exactly two character strings. Defaults to `c("","")`.
`rm`	[character] Characters to be removed from `x` before cutting into n-grams. May be a regular expression, f.ex. "[-\\|]" will capture the default symbol for linguistics zeros as well as the default segment separators. Empty string denotes nothing to replace. Defaults to empty string.
`as.table`	[logical] Return the result as a table? Defaults to `TRUE`.

Data processed with soundcorrs are generally expected to be segmented and aligned, and both segmentation and alignment are recommended to be performed manually. This is a laborious process, but it is feasible when segments represent morphemes or phonemes. Should segments represent n-grams, however, the fully manual approach would have been very time consuming and prone to errors.

[table] Table with counts of n-grams.

dataset <- loadSampleDataset ("data-capitals")
ngrams(dataset$data[,"ALIGNED.German"], n=2)
ngrams(dataset$data[,"ALIGNED.German"], n=3, as.table=FALSE)
ngrams(dataset$data[,"ALIGNED.German"], n=4, rm="[-\\|]", as.table=FALSE)
ngrams(dataset$data[,"ALIGNED.German"], n=5, borders=c(">","<"), rm="[-\\|]", as.table=FALSE)