train_word2vec: Train a model by word2vec.

View source: R/word2vec.R

train_word2vecR Documentation

Train a model by word2vec.

Description

Train a model by word2vec.

Usage

train_word2vec(train_file, output_file = "vectors.bin", vectors = 100,
  threads = 1, window = 12, classes = 0, cbow = 0, min_count = 5,
  iter = 5, force = F, negative_samples = 5)

Arguments

train_file

Path of a single .txt file for training. Tokens are split on spaces.

output_file

Path of the output file.

vectors

The number of vectors to output. Defaults to 100. More vectors usually means more precision, but also more random error, higher memory usage, and slower operations. Sensible choices are probably in the range 100-500.

threads

Number of threads to run training process on. Defaults to 1; up to the number of (virtual) cores on your machine may speed things up.

window

The size of the window (in words) to use in training.

classes

Number of classes for k-means clustering. Not documented/tested.

cbow

If 1, use a continuous-bag-of-words model instead of skip-grams. Defaults to false (recommended for newcomers).

min_count

Minimum times a word must appear to be included in the samples. High values help reduce model size.

iter

Number of passes to make over the corpus in training.

force

Whether to overwrite existing model files.

negative_samples

Number of negative samples to take in skip-gram training. 0 means full sampling, while lower numbers give faster training. For large corpora 2-5 may work; for smaller corpora, 5-15 is reasonable.

Details

The word2vec tool takes a text corpus as input and produces the word vectors as output. It first constructs a vocabulary from the training text data and then learns vector representation of words. The resulting word vector file can be used as features in many natural language processing and machine learning applications.

Value

A VectorSpaceModel object.

Author(s)

Jian Li <rweibo@sina.com>, Ben Schmidt <bmchmidt@gmail.com>

References

https://code.google.com/p/word2vec/

Examples

## Not run: 
model = train_word2vec(system.file("examples", "rfaq.txt", package = "wordVectors"))

## End(Not run)

bmschmidt/wordVectors documentation built on June 2, 2022, 3:53 p.m.