prep_word2vec | R Documentation |
This function exports a directory or document to a single file suitable to Word2Vec run on. That means a single, seekable txt file with tokens separated by spaces. (For example, punctuation is removed rather than attached to the end of words.) This function is extraordinarily inefficient: in most real-world cases, you'll be much better off preparing the documents using python, perl, awk, or any other scripting language that can reasonable read things in line-by-line.
prep_word2vec(origin, destination, lowercase = F, bundle_ngrams = 1, ...)
origin |
A text file or a directory of text files to be used in training the model |
destination |
The location for output text. |
lowercase |
Logical. Should uppercase characters be converted to lower? |
bundle_ngrams |
Integer. Statistically significant phrases of up to this many words will be joined with underscores: e.g., "United States" will usually be changed to "United_States" if it appears frequently in the corpus. This calls word2phrase once if bundle_ngrams is 2, twice if bundle_ngrams is 3, and so forth; see that function for more details. |
... |
Further arguments passed to word2phrase when bundle_ngrams is greater than 1. |
The file name (silently).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.