train_word2vec | R Documentation |
Train a model by word2vec.
train_word2vec(train_file, output_file = "vectors.bin", vectors = 100, threads = 1, window = 12, classes = 0, cbow = 0, min_count = 5, iter = 5, force = F, negative_samples = 5)
train_file |
Path of a single .txt file for training. Tokens are split on spaces. |
output_file |
Path of the output file. |
vectors |
The number of vectors to output. Defaults to 100. More vectors usually means more precision, but also more random error, higher memory usage, and slower operations. Sensible choices are probably in the range 100-500. |
threads |
Number of threads to run training process on. Defaults to 1; up to the number of (virtual) cores on your machine may speed things up. |
window |
The size of the window (in words) to use in training. |
classes |
Number of classes for k-means clustering. Not documented/tested. |
cbow |
If 1, use a continuous-bag-of-words model instead of skip-grams. Defaults to false (recommended for newcomers). |
min_count |
Minimum times a word must appear to be included in the samples. High values help reduce model size. |
iter |
Number of passes to make over the corpus in training. |
force |
Whether to overwrite existing model files. |
negative_samples |
Number of negative samples to take in skip-gram training. 0 means full sampling, while lower numbers give faster training. For large corpora 2-5 may work; for smaller corpora, 5-15 is reasonable. |
The word2vec tool takes a text corpus as input and produces the word vectors as output. It first constructs a vocabulary from the training text data and then learns vector representation of words. The resulting word vector file can be used as features in many natural language processing and machine learning applications.
A VectorSpaceModel object.
Jian Li <rweibo@sina.com>, Ben Schmidt <bmchmidt@gmail.com>
https://code.google.com/p/word2vec/
## Not run: model = train_word2vec(system.file("examples", "rfaq.txt", package = "wordVectors")) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.