ngrams: N-grams and their frequencies.

Description Usage Arguments Details Value Examples

Description

Find n-grams of specified length and return them as a list, or their counts as a table.

Usage

1
ngrams(x, n = 1, borders = c("", ""), rm = "", as.table = T)

Arguments

x

[character vector] Words to be cut into n-grams.

n

[integer] The length of n-grams to look for. Defaults to 1.

borders

[character] Characters to prepend and append to every word. Must be a vector of exactly two character strings. Defaults to c("","").

rm

[character] Characters to be removed from x before cutting into n-grams. May be a regular expression, f.ex. "[-\|]" will capture the default symbol for linguistics zeros as well as the default segment separators. Empty string denotes nothing to replace. Defaults to empty string.

as.table

[logical] Return the result as a table? Defaults to TRUE.

Details

Data processed with soundcorrs are generally expected to be segmented and aligned, and both segmentation and alignment are recommended to be performed manually. This is a laborious process, but it is feasible when segments represent morphemes or phonemes. Should segments represent n-grams, however, the fully manual approach would have been very time consuming and prone to errors.

Value

[table] Table with counts of n-grams.

Examples

1
2
3
4
5
dataset <- loadSampleDataset ("data-capitals")
ngrams(dataset$data[,"ALIGNED.German"], n=2)
ngrams(dataset$data[,"ALIGNED.German"], n=3, as.table=FALSE)
ngrams(dataset$data[,"ALIGNED.German"], n=4, rm="[-\\|]", as.table=FALSE)
ngrams(dataset$data[,"ALIGNED.German"], n=5, borders=c(">","<"), rm="[-\\|]", as.table=FALSE)

soundcorrs documentation built on Nov. 16, 2020, 5:09 p.m.