Description Usage Arguments Value Examples
Extract features from the Transformer model namely get
the embedding of a sentence
the embedding of the tokens of the sentence
the tokens of a sentence
1 2 3 4 5 6 7 8 |
object |
an object of class Transformer as returned by |
newdata |
a data.frame with columns doc_id and text indicating the text to embed |
type |
a character string, either 'embed-sentence', 'embed-token', 'tokenise' to get respectively sentence-level embeddings, token-level embeddings or the wordpiece tokens |
trace |
logical indicating to show a trace of the progress. Defaults to showing every 10 annotated embeddings |
... |
other arguments passed on to the methods |
depending on the argument type
the function returns:
embed-sentence: A matrix with the embedding of the text, where the doc_id's are in the rownames
embed-token: A list of matrices with token-level embeddings, one for each doc_id. The names of the list are identified by the doc_id. Note that depending on the model you will have CLS / SEP tokens at the start/back and the number of rows of the matrix is also limited by the model
tokenise: A list of subword (wordpiece) tokens. The names of the list are identified by the doc_id.
generate: generate tokens following the provided text sequence
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | transformer_download_model("bert-base-multilingual-uncased")
model <- transformer("bert-base-multilingual-uncased")
x <- data.frame(doc_id = c("doc_1", "doc_2"),
text = c("provide some words to embed", "another sentence of text"),
stringsAsFactors = FALSE)
predict(model, x, type = "tokenise")
embedding <- predict(model, x, type = "embed-sentence")
dim(embedding)
embedding <- predict(model, x, type = "embed-token")
str(embedding)
unlink(file.path(system.file(package = "golgotha", "models"),
"bert-base-multilingual-uncased"), recursive = TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.