View source: R/distilbert-embeddings.R
nlp_distilbert_embeddings_pretrained | R Documentation |
DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark. See https://nlp.johnsnowlabs.com/docs/en/transformers#distilbertembeddings
nlp_distilbert_embeddings_pretrained( sc, input_cols, output_col, batch_size = NULL, case_sensitive = NULL, dimension = NULL, max_sentence_length = NULL, storage_ref = NULL, name = NULL, lang = NULL, remote_loc = NULL )
input_cols |
Input columns. String array. |
output_col |
Output column. String. |
batch_size |
Size of every batch (Default depends on model). |
case_sensitive |
Whether to ignore case in index lookups (Default depends on model) |
dimension |
Number of embedding dimensions (Default depends on model) |
max_sentence_length |
Max sentence length to process (Default: 128) |
storage_ref |
Unique identifier for storage (Default: this.uid) |
x |
A |
uid |
A character string used to uniquely identify the ML estimator. |
The object returned depends on the class of x
.
spark_connection
: When x
is a spark_connection
, the function returns an instance of a ml_estimator
object. The object contains a pointer to
a Spark Estimator
object and can be used to compose
Pipeline
objects.
ml_pipeline
: When x
is a ml_pipeline
, the function returns a ml_pipeline
with
the NLP estimator appended to the pipeline.
tbl_spark
: When x
is a tbl_spark
, an estimator is constructed then
immediately fit with the input tbl_spark
, returning an NLP model.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.