View source: R/xlm_roberta_sentence_embeddings.R
nlp_xlm_roberta_sentence_embeddings_pretrained | R Documentation |
Create a pretrained Spark NLP XlmRoBertaSentenceEmbeddings
model.
See https://nlp.johnsnowlabs.com/docs/en/annotators#xlmrobertasentenceembeddings
nlp_xlm_roberta_sentence_embeddings_pretrained( sc, input_cols, output_col, case_sensitive = NULL, batch_size = NULL, dimension = NULL, max_sentence_length = NULL, name = NULL, lang = NULL, remote_loc = NULL )
sc |
A Spark connection |
input_cols |
Input columns. String array. |
output_col |
Output column. String. |
case_sensitive |
whether to lowercase tokens or not |
batch_size |
batch size |
dimension |
defines the output layer of BERT when calculating embeddings |
max_sentence_length |
max sentence length to process |
name |
the name of the model to load. If NULL will use the default value |
lang |
the language of the model to be loaded. If NULL will use the default value |
remote_loc |
the remote location of the model. If NULL will use the default value |
Sentence-level embeddings using XLM-RoBERTa. The XLM-RoBERTa model was proposed in Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.
The Spark NLP model with the pretrained model loaded
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.