ngrams: Calcualte ngrams

Description Usage Arguments Value Examples

View source: R/corpus_operations.R

Description

ngrams calculates ngrams for a corpus or subcorpus. Ngram calculation uses a lot of memory and may fail for large corpora or subcorpora or causing the system to hang. For computational efficiency, results are not cleaned and may contain some artifacts (i.e. ngrams with less than the specified token count).

Usage

1
ngrams(cqp_corpus, ngram_length, pattr = "word", ignore_punct = T)

Arguments

ngram_length

numeric. Length of ngrams.

pattr

character. Positional attribute to use for ngram calculation.

ignore_punct

logical. If TRUE, punctuation is ignored in ngram calculation.

corpus

corpus created with get_corpus.

Value

data.table with ngrams and frequencies.

Examples

1
## Not run: trigrams <- ngrams(my_corpus, 3, min_count = 10)

wiertz/rusecqp documentation built on Feb. 9, 2022, 1:30 p.m.