formatWordEmbeddings: Format Word Embeddings

Description Usage Arguments Details Value References Examples

View source: R/formatWordEmbeddings.R

Description

This function formats the word embeddings.

Usage

1
formatWordEmbeddings(embedding_matrix, normalize = TRUE, verbose = TRUE)

Arguments

embedding_matrix

word embedding matrix. For a matrix containing information on n words, with each word being represented by a d dimensional vector, embedding_matrix should have n rows and d+1 columns where the first column contains the words.

normalize

logical; should the word embeddings be normalized.

verbose

logical; should the function report on progress.

Details

This function downloads GloVe (https://nlp.stanford.edu/projects/glove/) and formats the word embeddings. The result is a named list of word embeddings. Each entry in the list is a numeric vector of length dimension representing the word embedding for that entry's name (see examples).

Value

A named list of word embeddings.

References

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. https://nlp.stanford.edu/projects/glove/.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Not run: 
# temp <- tempfile()
# download.file("http://nlp.stanford.edu/data/wordvecs/glove.6B.zip", temp)

# embedding_matrix <- read.table(unz(temp, "glove.6B.300d.txt"), quote = "",
#                                comment.char = "", stringsAsFactors = FALSE)

word_embeddings <- formatWordEmbeddings(embedding_matrix_example, normalize = TRUE, verbose = TRUE)

# Extract the word embedding for "the"
word_embeddings[["the"]]

## End(Not run)

scottmanski/TAGAM documentation built on Aug. 3, 2020, 10:50 a.m.