formatWordEmbeddings: Format Word Embeddings
In scottmanski/TAGAM:

Description Usage Arguments Details Value References Examples

View source: R/formatWordEmbeddings.R

This function formats the word embeddings.

1	formatWordEmbeddings(embedding_matrix, normalize = TRUE, verbose = TRUE)

`embedding_matrix`	word embedding matrix. For a matrix containing information on n words, with each word being represented by a d dimensional vector, `embedding_matrix` should have n rows and d+1 columns where the first column contains the words.
`normalize`	logical; should the word embeddings be normalized.
`verbose`	logical; should the function report on progress.

This function downloads GloVe (https://nlp.stanford.edu/projects/glove/) and formats the word embeddings. The result is a named list of word embeddings. Each entry in the list is a numeric vector of length dimension representing the word embedding for that entry's name (see examples).

A named list of word embeddings.

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. https://nlp.stanford.edu/projects/glove/.

## Not run: 
# temp <- tempfile()
# download.file("http://nlp.stanford.edu/data/wordvecs/glove.6B.zip", temp)

# embedding_matrix <- read.table(unz(temp, "glove.6B.300d.txt"), quote = "",
#                                comment.char = "", stringsAsFactors = FALSE)

word_embeddings <- formatWordEmbeddings(embedding_matrix_example, normalize = TRUE, verbose = TRUE)

# Extract the word embedding for "the"
word_embeddings[["the"]]

## End(Not run)