nlp_univ_sent_encoder: Spark NLP UniversalSentenceEncoder

View source: R/univ_sent_encoder.R

nlp_univ_sent_encoderR Documentation

Spark NLP UniversalSentenceEncoder

Description

Spark ML transformer that encodes text into high dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks. See https://nlp.johnsnowlabs.com/docs/en/annotators#universalsentenceencoder

Usage

nlp_univ_sent_encoder(
  x,
  input_cols,
  output_col,
  dimension = NULL,
  uid = random_string("univ_sent_encoder_")
)

Arguments

x

A spark_connection, ml_pipeline, or a tbl_spark.

input_cols

Input columns. String array.

output_col

Output column. String.

dimension

dimension to use for the embeddings

uid

A character string used to uniquely identify the ML estimator.

Value

The object returned depends on the class of x.

  • spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Estimator object and can be used to compose Pipeline objects.

  • ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP estimator appended to the pipeline.

  • tbl_spark: When x is a tbl_spark, an estimator is constructed then immediately fit with the input tbl_spark, returning an NLP model.


r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.