nlp_sentence_embeddings: Spark NLP SentenceEmbeddings

View source: R/sentence-embeddings.R

nlp_sentence_embeddingsR Documentation

Spark NLP SentenceEmbeddings

Description

Spark ML transformer that converts the results from WordEmbeddings or BertEmbeddings into sentence or document embeddings by either summing up or averaging all the word embeddings in a sentence or a document (depending on the input_cols). See https://nlp.johnsnowlabs.com/docs/en/annotators#sentenceembeddings

Usage

nlp_sentence_embeddings(
  x,
  input_cols,
  output_col,
  pooling_strategy = NULL,
  storage_ref = NULL,
  uid = random_string("sentence_embeddings_")
)

Arguments

x

A spark_connection, ml_pipeline, or a tbl_spark.

input_cols

Input columns. String array.

output_col

Output column. String.

pooling_strategy

Choose how you would like to aggregate Word Embeddings to Sentence Embeddings: AVERAGE or SUM

storage_ref

storage reference for the embeddings

uid

A character string used to uniquely identify the ML estimator.

Value

The object returned depends on the class of x.

  • spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Estimator object and can be used to compose Pipeline objects.

  • ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP estimator appended to the pipeline.

  • tbl_spark: When x is a tbl_spark, an estimator is constructed then immediately fit with the input tbl_spark, returning an NLP model.


r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.