Description Usage Arguments Value Examples
View source: R/corpus_operations.R
ngrams
calculates ngrams for a corpus or subcorpus. Ngram calculation
uses a lot of memory and may fail for large corpora or subcorpora or causing
the system to hang. For computational efficiency, results are not cleaned and
may contain some artifacts (i.e. ngrams with less than the specified token
count).
1 | ngrams(cqp_corpus, ngram_length, pattr = "word", ignore_punct = T)
|
ngram_length |
numeric. Length of ngrams. |
pattr |
character. Positional attribute to use for ngram calculation. |
ignore_punct |
logical. If TRUE, punctuation is ignored in ngram calculation. |
corpus |
corpus created with |
data.table with ngrams and frequencies.
1 | ## Not run: trigrams <- ngrams(my_corpus, 3, min_count = 10)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.