predict.BPEembed | R Documentation |
Use the sentencepiece model to either
encode: tokenise and embed text
decode: get the untokenised text back of tokenised data
tokenize: only tokenize alongside the sentencepiece model
## S3 method for class 'BPEembed' predict(object, newdata, type = c("encode", "decode", "tokenize"), ...)
object |
an object of class BPEembed as returned by |
newdata |
a character vector of text to encode or a character vector of encoded tokens to decode or a list of those |
type |
character string, either 'encode', 'decode' or 'tokenize' |
... |
further arguments passed on to the methods |
in case type is set to 'encode'
: a list of matrices containing embeddings of the text which is tokenised with sentencepiece_encode
in case type is set to 'decode'
: a character vector of decoded text as returned by sentencepiece_decode
in case type is set to 'tokenize'
: a tokenised sentencepiece_encode
BPEembed
, sentencepiece_decode
, sentencepiece_encode
embedding <- system.file(package = "sentencepiece", "models", "nl.wiki.bpe.vs1000.d25.w2v.bin") model <- system.file(package = "sentencepiece", "models", "nl.wiki.bpe.vs1000.model") encoder <- BPEembed(model, embedding) txt <- c("De eigendomsoverdracht aan de deelstaten is ingewikkeld.", "On est d'accord sur le prix de la biere?") values <- predict(encoder, txt, type = "encode") str(values) values txt <- rownames(values[[1]]) predict(encoder, txt, type = "decode") txt <- lapply(values, FUN = rownames) predict(encoder, txt, type = "decode") txt <- c("De eigendomsoverdracht aan de deelstaten is ingewikkeld.", "On est d'accord sur le prix de la biere?") predict(encoder, txt, type = "tokenize", "subwords") predict(encoder, txt, type = "tokenize", "ids")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.