Description Usage Format Fields Usage Methods Arguments See Also Examples

Class for GloVe word-embeddings model.
It can be trained via fully can asynchronous and parallel
AdaGrad with `$fit_transform()`

method.

1 |

`R6Class`

object.

`components`

represents context word vectors

`n_dump_every`

`integer = 0L`

by default. Defines frequency of dumping word vectors. For example user can ask to dump word vectors each 5 iteration.`shuffle`

`logical = FALSE`

by default. Defines shuffling before each SGD iteration. Generally shuffling is a good idea for stochastic-gradient descent, but from my experience in this particular case it does not improve convergence.`grain_size`

`integer = 1e5L`

by default. This is the grain_size for`RcppParallel::parallelReduce`

. For details, see http://rcppcore.github.io/RcppParallel/#grain-size.**We don't recommend to change this parameter.**

For usage details see **Methods, Arguments and Examples** sections.

1 2 3 4 5 6 | ```
glove = GlobalVectors$new(word_vectors_size, vocabulary, x_max, learning_rate = 0.15,
alpha = 0.75, lambda = 0.0, shuffle = FALSE, initial = NULL)
glove$fit_transform(x, n_iter = 10L, convergence_tol = -1, n_check_convergence = 1L,
n_threads = RcppParallel::defaultNumThreads(), ...)
glove$components
glove$dump()
``` |

`$new(word_vectors_size, vocabulary, x_max, learning_rate = 0.15, alpha = 0.75, lambda = 0, shuffle = FALSE, initial = NULL)`

Constructor for Global vectors model. For description of arguments see

**Arguments**section.`$fit_transform(x, n_iter = 10L, convergence_tol = -1, n_check_convergence = 1L, n_threads = RcppParallel::defaultNumThreads(), ...)`

fit Glove model to input matrix

`x`

`$dump()`

get model internals - word vectors and biases for main and context words

`$get_history`

get history of SGD costs and word vectors (if

`n_dump_every > 0)`

- glove
A

`GloVe`

object- x
An input term co-occurence matrix. Preferably in

`dgTMatrix`

format- n_iter
`integer`

number of SGD iterations- word_vectors_size
desired dimension for word vectors

- vocabulary
`character`

vector or instance of`text2vec_vocabulary`

class. Each word should correspond to dimension of co-occurence matrix.- x_max
`integer`

maximum number of co-occurrences to use in the weighting function. see the GloVe paper for details: http://nlp.stanford.edu/pubs/glove.pdf- learning_rate
`numeric`

learning rate for SGD. I do not recommend that you modify this parameter, since AdaGrad will quickly adjust it to optimal- convergence_tol
`numeric = -1`

defines early stopping strategy. We stop fitting when one of two following conditions will be satisfied: (a) we have used all iterations, or (b)`cost_previous_iter / cost_current_iter - 1 < convergence_tol`

. By default perform all iterations.- alpha
`numeric = 0.75`

the alpha in weighting function formula :*f(x) = 1 if x > x_max; else (x/x_max)^alpha*- lambda
`numeric = 0.0`

, L1 regularization coefficient.`0`

= vanilla GloVe, corresponds to original paper and implementation.`lambda >0`

corresponds to text2vec new feature and different SGD algorithm. From our experience small lambda (like`lambda = 1e-5`

) usually produces better results that vanilla GloVe on small corpuses- initial
`NULL`

- word vectors and word biases will be initialized randomly. Or named`list`

which contains`w_i, w_j, b_i, b_j`

values - initial word vectors and biases. This is useful for fine-tuning. For example one can pretrain model on large corpus (such as wikipedia dump) and then fine tune on smaller task-specific dataset

http://nlp.stanford.edu/projects/glove/

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ```
## Not run:
temp = tempfile()
download.file('http://mattmahoney.net/dc/text8.zip', temp)
text8 = readLines(unz(temp, "text8"))
it = itoken(text8)
vocabulary = create_vocabulary(it)
vocabulary = prune_vocabulary(vocabulary, term_count_min = 5)
v_vect = vocab_vectorizer(vocabulary)
tcm = create_tcm(it, v_vect, skip_grams_window = 5L)
glove_model = GloVe$new(word_vectors_size = 50,
vocabulary = vocabulary, x_max = 10, learning_rate = .25)
# fit model and get word vectors
word_vectors_main = glove_model$fit_transform(tcm, n_iter = 10)
word_vectors_context = glove_model$components
word_vectors = word_vectors_main + t(word_vectors_context)
## End(Not run)
``` |

text2vec documentation built on Jan. 12, 2018, 1:04 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.