Description Usage Arguments Value
An experimental function to efficiently generate a vocabulary in parallel from output produced by the ngrams() function. Cores > 1 will only work for users with GNU coreutils > 8.13 as the sort –parallel option is used. If you have an older version use cores = 1.
1 2 |
ngrams |
An optional list object output by the ngrams() function. |
input_directory |
An optional input directory where blocked output from th ngrams() function is stored as .Rdata files. |
file_list |
An optional vector of file names to be used. Useful if you only want to work on a subset of the input. |
combine_ngrams |
Logical indicating whether simple ngrams should be combined together when forming the vocabulary. If FALSE, then separate vocabularies will be generated for each ngram length. Defaults to FALSE. |
cores |
The number of cores to be used for parallelization. |
mac_brew |
An option to use alternate versions of shell commands that are compatible with GNU coretools as installed via "brew install coretools". Simple adds a "g" infront of commands. |
Returns a list object with the vocabulary (sorted by frequency) and and word counts.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.