View source: R/longformer-embeddings.R
nlp_longformer_embeddings_pretrained | R Documentation |
Longformer is a transformer model for long documents. The Longformer model was presented in Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan. longformer-base-4096 is a BERT-like model started from the RoBERTa checkpoint and pretrained for MLM on long documents. It supports sequences of length up to 4,096. See https://nlp.johnsnowlabs.com/docs/en/transformers#longformerembeddings
nlp_longformer_embeddings_pretrained( sc, input_cols, output_col, batch_size = NULL, case_sensitive = NULL, dimension = NULL, max_sentence_length = NULL, storage_ref = NULL, name = NULL, lang = NULL, remote_loc = NULL )
input_cols |
Input columns. String array. |
output_col |
Output column. String. |
batch_size |
Size of every batch (Default depends on model). |
case_sensitive |
Whether to ignore case in index lookups (Default depends on model) |
dimension |
Number of embedding dimensions (Default depends on model) |
max_sentence_length |
Max sentence length to process (Default: 128) |
storage_ref |
Unique identifier for storage (Default: this.uid) |
x |
A |
uid |
A character string used to uniquely identify the ML estimator. |
The object returned depends on the class of x
.
spark_connection
: When x
is a spark_connection
, the function returns an instance of a ml_estimator
object. The object contains a pointer to
a Spark Estimator
object and can be used to compose
Pipeline
objects.
ml_pipeline
: When x
is a ml_pipeline
, the function returns a ml_pipeline
with
the NLP estimator appended to the pipeline.
tbl_spark
: When x
is a tbl_spark
, an estimator is constructed then
immediately fit with the input tbl_spark
, returning an NLP model.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.