Description Usage Arguments Value
Converts a collection of documents to a word list
1 2 3 4 5 6 7 8 9 10 11 12 | corpora_to_word_list(
paths,
ascii_only = TRUE,
custom_regex = NA,
max_word_length = 20,
stopword_fn = DEFAULT_STOPWORDS,
min_word_count = 5,
max_size = 16^3,
min_word_length = 3,
output_file = NA,
json_path = NA
)
|
paths |
Paths of plaintext documents |
ascii_only |
Will omit non-ascii characters if TRUE |
custom_regex |
If not NA, will override ascii_only and this will determine what a valid word consists of |
max_word_length |
Maximum length of extracted words |
stopword_fn |
Filename containing stopwords to use or a list of stopwords (if length > 1) |
min_word_count |
Minimum number of occurrences for a word to be added to word list |
max_size |
Maximum size of list |
min_word_length |
Minimum length of words |
output_file |
File to write list to |
json_path |
If input text is JSON, then it will be parsed as such if this is a character of JSON keys to follow |
A 'character' vector of words
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.