train_embeddings: Trains embeddings from your corpus using methods described...

Description Usage Arguments Value Examples

Description

Trains embeddings from your corpus using methods described here: http://nlp.stanford.edu/pubs/glove.pdf. Using the text2vec package; see that package for more info.

Usage

1
2
train_embeddings(vocab, it_all, vocab_vectorizer, window = 10,
  dimensions = 100, max_iters = 50, max_cooccur = 50)

Arguments

it_all

The tokens from Create_Vocab_Document_Term_Matrix.

vocab_vectorizer

The vocabulary vectorizer from Create_Vocab_Document_Term_Matrix

window

The window size for word co-occurences

dimensions

The number of dimensions returned for word embeddings. Defaults to 100

max_iters

The maximum number of iterations for training the embeddings. Defaults to 50

max_cooccur

The maximum number of times a word-word co-occurence may be used in weighting the model. Defaults to 50. Value should be proportional to amount of data.

input

A dataframe of a text corpus

Value

Returns a dataframe of word embeddings

Examples

1
train_embeddings(Myvocab, itokens, vocab_vectorizer, window=10, dimensions=100, max_iters=50, max_cooccur=50)

adrianapaza/easynlp documentation built on May 9, 2019, 7:31 p.m.