word2clusters: Word Clusters

Description Usage Arguments Value Examples

View source: R/wordclusters.R

Description

Gives each word a class ID number.

Usage

1
2
3
4
word2clusters(train, output = NULL, classes = 0L, size = 100L,
  window = 5L, sample = 1e-05, hs = 0L, negative = 5L,
  threads = 1L, iter = 5L, min_count = 5L, alpha = 0.025,
  debug = 2L, binary = 0L, cbow = 1L, verbose = FALSE)

Arguments

train

Use text data from file to train the model.

output

Use file to save the resulting word vectors / word clusters.

classes

Number of word classes; if 0L, output word classes rather than word vectors (default 0L).

size

Set size of word vectors; default is 100L.

window

Set max skip length between words; default is 5L.

sample

Set threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled; default is 1e-5.

hs

Use Hierarchical Softmax; default is 1 (0L = not used)

negative

Number of negative examples; default is 0L, common values are 5 - 10 (0L = not used).

threads

Use n threads (default 12L).

iter

Run more training iterations (default 5).

min_count

This will discard words that appear less than n times; default is 5L.

alpha

Set the starting learning rate; default is .025.

debug

Set the debug mode (default = 2L = more info during training).

binary

Save the resulting vectors in binary moded; default is 0L (off).

cbow

Use the continuous back of words model; default is 1L (skip-gram model).

verbose

Whether to print output from training.

Value

Invisibly returns the output.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Not run: 
# setup word2vec Julia dependency
setup_word2vec()

# sample corpus
data("macbeth", package = "word2vec.r")

# train model
model_path <- word2clusters(macbeth, classes = 50L)

## End(Not run)

news-r/word2vec.r documentation built on Nov. 4, 2019, 9:41 p.m.