View source: R/roberta_sentence_embeddings.R
nlp_roberta_sentence_embeddings_pretrained | R Documentation |
Create a pretrained Spark NLP RoBertaSentenceEmbeddings
model.
Sentence-level embeddings using RoBERTa. The RoBERTa model was proposed in
RoBERTa: A Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott,
Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis,
Luke Zettlemoyer, Veselin Stoyanov. It is based on Google's BERT model
released in 2018.
nlp_roberta_sentence_embeddings_pretrained( sc, input_cols, output_col, case_sensitive = NULL, batch_size = NULL, dimension = NULL, max_sentence_length = NULL, name = NULL, lang = NULL, remote_loc = NULL )
sc |
A Spark connection |
input_cols |
Input columns. String array. |
output_col |
Output column. String. |
case_sensitive |
whether to lowercase tokens or not |
batch_size |
batch size |
dimension |
defines the output layer of BERT when calculating embeddings |
max_sentence_length |
max sentence length to process |
name |
the name of the model to load. If NULL will use the default value |
lang |
the language of the model to be loaded. If NULL will use the default value |
remote_loc |
the remote location of the model. If NULL will use the default value |
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with much larger mini-batches and learning rates. See https://nlp.johnsnowlabs.com/docs/en/annotators#robertabertsentenceembeddings
The Spark NLP model with the pretrained model loaded
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.