Description Usage Arguments Details See Also Examples
View source: R/read_word_embeddings.R
read_word_embeddings
reads specified words from word embedding files
quickly and without using much memory. It formats its output for the
scale_text
function. The rows of the output are words and the columns
are the dimensions from the word embeddings. Correspondingly, the row names
are the vocabulary and the column names are the names of the dimensions.
1 2 | read_word_embeddings(in_vocab, ovefile = NA, ovefile2 = NA,
wikfile = NA, twifile = NA)
|
in_vocab |
Character vector. This is the vocabulary to look for in the word embeddings. |
ovefile |
A character scalar (filename). Use this for O2M_overlap.txt from the meta embeddings. This is a meta-analysis of many pre-trained word embeddings. Recommended. |
ovefile2 |
A character scalar (filename). Use this for O2M_oov.txt from the meta embeddings. These are the rare words for the meta-analysis of many pre-trained word embeddings. |
wikfile |
A character scalar (filename). Use this for glove.6B.300d.txt from the Wikipedia embeddings. These word embeddings are trained on Wikipedia entries only. |
twifile |
A character scalar (filename). Use this for glove.twitter.27B.200d.txt from the Twitter embeddings. These word embeddings are trained on Twitter data only. |
This function reads one or more of the pre-trained word embeddings listed above. You need to first download these files and unzip them on your computer before you can use them. Remember to add the file path to the file name when you specify it in this function.
Meta embeddings: http://www.cis.uni-muenchen.de/~wenpeng/renamed-meta-emb.tar.gz
Wikipedia embeddings: http://nlp.stanford.edu/data/glove.6B.zip
Twitter embeddings: http://nlp.stanford.edu/data/glove.twitter.27B.zip
You can specify one or more pre-trained word embedding files. I recommend the meta embeddings. The full meta embeddings are contained in two files – one for ordinary words and one for rare words and/or misspellings (that appeared in only a subset of the different text sources).
scale_text
, doc_to_tdm
,
get_keywords
, plot_keywords
,
score_documents
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ## Not run:
# download and extract embeddings data first
embeddings <- read_word_embeddings(
in_vocab = out$vocab,
# must add location on your computer "path/to/O2M_overlap.txt"
ovefile = "O2M_overlap.txt",
ovefile2 = "O2M_oov.txt" # very rare words and misspellings
## available here:
## http://www.cis.uni-muenchen.de/~wenpeng/renamed-meta-emb.tar.gz
## must unpack and replace "path/to/" with location on your computer
)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.