Class for GloVe word-embeddings model.
It can be trained via fully can asynchronous and parallel
represents context word vectors
integer = 0L by default. Defines frequency of dumping word vectors. For example user
can ask to dump word vectors each 5 iteration.
logical = FALSE by default. Defines shuffling before each SGD iteration.
Generally shuffling is a good idea for stochastic-gradient descent, but
from my experience in this particular case it does not improve convergence.
integer = 1e5L by default. This is the
RcppParallel::parallelReduce. For details, see
We don't recommend to change this parameter.
For usage details see Methods, Arguments and Examples sections.
1 2 3 4 5 6
glove = GlobalVectors$new(word_vectors_size, vocabulary, x_max, learning_rate = 0.15, alpha = 0.75, lambda = 0.0, shuffle = FALSE, initial = NULL) glove$fit_transform(x, n_iter = 10L, convergence_tol = -1, n_check_convergence = 1L, n_threads = RcppParallel::defaultNumThreads(), ...) glove$components glove$dump()
$new(word_vectors_size, vocabulary, x_max, learning_rate = 0.15, alpha = 0.75, lambda = 0, shuffle = FALSE, initial = NULL)
Constructor for Global vectors model. For description of arguments see Arguments section.
$fit_transform(x, n_iter = 10L, convergence_tol = -1, n_check_convergence = 1L, n_threads = RcppParallel::defaultNumThreads(), ...)
fit Glove model to input matrix
get model internals - word vectors and biases for main and context words
get history of SGD costs and word vectors (if
n_dump_every > 0)
An input term co-occurence matrix. Preferably in
integer number of SGD iterations
desired dimension for word vectors
character vector or instance of
text2vec_vocabulary class. Each word should correspond to dimension
of co-occurence matrix.
integer maximum number of co-occurrences to use in the weighting function.
see the GloVe paper for details: http://nlp.stanford.edu/pubs/glove.pdf
numeric learning rate for SGD. I do not recommend that you
modify this parameter, since AdaGrad will quickly adjust it to optimal
numeric = -1 defines early stopping strategy. We stop fitting
when one of two following conditions will be satisfied: (a) we have used
all iterations, or (b)
cost_previous_iter / cost_current_iter - 1 <
convergence_tol. By default perform all iterations.
numeric = 0.75 the alpha in weighting function formula : f(x) = 1 if x >
x_max; else (x/x_max)^alpha
numeric = 0.0, L1 regularization coefficient.
0 = vanilla GloVe, corresponds to original paper and implementation.
lambda >0 corresponds to text2vec new feature and different SGD algorithm. From our experience
small lambda (like
lambda = 1e-5) usually produces better results that vanilla GloVe
on small corpuses
NULL - word vectors and word biases will be initialized
randomly. Or named
list which contains
w_i, w_j, b_i, b_j values -
initial word vectors and biases. This is useful for fine-tuning. For example one can
pretrain model on large corpus (such as wikipedia dump) and then fine tune
on smaller task-specific dataset
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## Not run: temp = tempfile() download.file('http://mattmahoney.net/dc/text8.zip', temp) text8 = readLines(unz(temp, "text8")) it = itoken(text8) vocabulary = create_vocabulary(it) vocabulary = prune_vocabulary(vocabulary, term_count_min = 5) v_vect = vocab_vectorizer(vocabulary) tcm = create_tcm(it, v_vect, skip_grams_window = 5L) glove_model = GloVe$new(word_vectors_size = 50, vocabulary = vocabulary, x_max = 10, learning_rate = .25) # fit model and get word vectors word_vectors_main = glove_model$fit_transform(tcm, n_iter = 10) word_vectors_context = glove_model$components word_vectors = word_vectors_main + t(word_vectors_context) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.