build_vectors: Build fasttext vectors

Description Usage Arguments Value Examples

View source: R/API.R

Description

Trains a fasttext vector/unsupervised model following method described in Enriching Word Vectors with Subword Information using the fasttext implementation.

See FastText word representation tutorial for more information on training unsupervised models using fasttext.

Usage

1
2
3
4
5
build_vectors(documents, model_path, modeltype = c("skipgram", "cbow"),
  bucket = 2e+06, dim = 100, epoch = 5, label = "__label__",
  loss = c("ns", "hs", "softmax", "ova", "one-vs-all"), lr = 0.05,
  lrUpdateRate = 100, maxn = 6, minCount = 5, minn = 3, neg = 5,
  t = 1e-04, thread = 12, verbose = 2, wordNgrams = 1, ws = 5)

Arguments

documents

character vector of documents used for training

model_path

Name of output file without file extension.

modeltype

Should training be done using skipgram or cbow? Defaults to skipgram.

bucket

number of buckets

dim

size of word vectors

epoch

number of epochs

label

text string, labels prefix. Default is "label"

loss

loss function ns, hs, softmax

lr

learning rate

lrUpdateRate

change the rate of updates for the learning rate

maxn

max length of char ngram

minCount

minimal number of word occurences

minn

min length of char ngram

neg

number of negatives sampled

t

sampling threshold

thread

number of threads

verbose

verbosity level

wordNgrams

max length of word ngram

ws

size of the context window

Value

path to model file, as character

Examples

1
2
3
4
5
6
7
## Not run: 
library(fastrtext)
text <- train_sentences
model_file <- build_vectors(text[['text']], 'my_model')
model <- load_model(model_file)

## End(Not run)

fastrtext documentation built on Oct. 30, 2019, 11:32 a.m.