Description Usage Arguments Value References Examples
Trains a bi-directional word2vec model on a corpus or training data, which is in general a .txt file. A bi-directional word2vec model uses separate vector representations for the left and right context of a word. This enables the user to determine enrichment of association of a word with another in terms of context of use.
1 2 3 4 5 | train_biword2vec(train_file, output_file_left = "vectors_left.bin",
output_file_right = "vectors_right.bin",
output_file_out = "vectors_out.bin", vectors = 100, threads = 3,
window = 12, classes = 0, cbow = 0, min_count = 1, iter = 5,
force = F, negative_samples = 0)
|
train_file |
Path of a single .txt file for training. Tokens are split on spaces. |
output_file_left |
Path of the output file for the left context words. |
output_file_right |
Path of the output fle for the right context words. |
vectors |
The number of vectors to output. Defaults to 100. More vectors usually means more precision, but also more random error, higher memory usage, and slower operations. Sensible choices are probably in the range 100-500. |
threads |
Number of threads to run training process on. Defaults to 1; up to the number of (virtual) cores on your machine may speed things up. |
window |
The size of the window (in words) to use in training. |
classes |
Number of classes for k-means clustering. Not documented/tested. |
cbow |
If 1, use a continuous-bag-of-words model instead of skip-grams. Defaults to false (recommended for newcomers). |
min_count |
Minimum times a word must appear to be included in the samples. High values help reduce model size. |
iter |
Number of passes to make over the corpus in training. |
force |
Whether to overwrite existing model files. |
negative_samples |
Number of negative samples to take in skip-gram training. 0 means full sampling, while lower numbers give faster training. For large corpora 2-5 may work; for smaller corpora, 5-15 is reasonable. |
A VectorSpaceModel object.
https://code.google.com/p/word2vec/
1 2 3 4 5 6 | ## Not run:
model = train_biword2vec("nation.txt", output_file_left = "out_left.bin",
output_file_right = "out_right.bin", threads = 3,
woindow = 5)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.