BPEembed | R Documentation |
Use a sentencepiece model to tokenise text and get the embeddings of these
BPEembed( file_sentencepiece = x$file_model, file_word2vec = x$glove.bin$file_model, x, normalize = TRUE )
file_sentencepiece |
the path to the file containing the sentencepiece model |
file_word2vec |
the path to the file containing the word2vec embeddings |
x |
the result of a call to |
normalize |
passed on to |
an object of class BPEembed which is a list with elements
model: a sentencepiece model as loaded with sentencepiece_load_model
embedding: a matrix with embeddings as loaded with read.wordvectors
dim: the dimension of the embedding
n: the number of elements in the vocabulary
file_sentencepiece: the sentencepiece model file
file_word2vec: the word2vec embedding file
predict.BPEembed
, sentencepiece_load_model
, sentencepiece_download_model
, read.wordvectors
## ## Example loading model from disk ## folder <- system.file(package = "sentencepiece", "models") embedding <- file.path(folder, "nl.wiki.bpe.vs1000.d25.w2v.bin") model <- file.path(folder, "nl.wiki.bpe.vs1000.model") encoder <- BPEembed(model, embedding) ## Do tokenisation with the sentencepiece model + embed these txt <- c("De eigendomsoverdracht aan de deelstaten is ingewikkeld.", "On est d'accord sur le prix de la biere?") values <- predict(encoder, txt, type = "encode") str(values) values txt <- rownames(values[[1]]) predict(encoder, txt, type = "decode") txt <- lapply(values, FUN = rownames) predict(encoder, txt, type = "decode")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.