nlp_xlm_roberta_sentence_embeddings_pretrained: Load a pretrained Spark NLP XlmRoBertaSentenceEmbeddings...

View source: R/xlm_roberta_sentence_embeddings.R

nlp_xlm_roberta_sentence_embeddings_pretrainedR Documentation

Load a pretrained Spark NLP XlmRoBertaSentenceEmbeddings model

Description

Create a pretrained Spark NLP XlmRoBertaSentenceEmbeddings model. See https://nlp.johnsnowlabs.com/docs/en/annotators#xlmrobertasentenceembeddings

Usage

nlp_xlm_roberta_sentence_embeddings_pretrained(
  sc,
  input_cols,
  output_col,
  case_sensitive = NULL,
  batch_size = NULL,
  dimension = NULL,
  max_sentence_length = NULL,
  name = NULL,
  lang = NULL,
  remote_loc = NULL
)

Arguments

sc

A Spark connection

input_cols

Input columns. String array.

output_col

Output column. String.

case_sensitive

whether to lowercase tokens or not

batch_size

batch size

dimension

defines the output layer of BERT when calculating embeddings

max_sentence_length

max sentence length to process

name

the name of the model to load. If NULL will use the default value

lang

the language of the model to be loaded. If NULL will use the default value

remote_loc

the remote location of the model. If NULL will use the default value

Details

Sentence-level embeddings using XLM-RoBERTa. The XLM-RoBERTa model was proposed in Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.

Value

The Spark NLP model with the pretrained model loaded


r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.