nlp_word_embeddings: Spark NLP WordEmbeddings
In r-spark/sparknlp: R Interface to John Snow Labs Spark NLP

nlp_word_embeddings

R Documentation

Spark NLP WordEmbeddings

Description

Spark ML estimator that maps tokens to vectors See https://nlp.johnsnowlabs.com/docs/en/annotators#word-embeddings

Usage

nlp_word_embeddings(
  x,
  input_cols,
  output_col,
  storage_path,
  storage_path_format = "TEXT",
  storage_ref = NULL,
  dimension,
  case_sensitive = NULL,
  lazy_annotator = NULL,
  read_cache_size = NULL,
  write_buffer_size = NULL,
  include_storage = FALSE,
  uid = random_string("word_embeddings_")
)

Arguments

`x`	A `spark_connection`, `ml_pipeline`, or a `tbl_spark`.
`input_cols`	Input columns. String array.
`output_col`	Output column. String.
`storage_path`	word embeddings file
`storage_path_format`	format of word embeddings files. One of: text -> this format is usually used by Glove binary -> this format is usually used by Word2Vec spark-nlp -> internal format for already serialized embeddings. Use this only when resaving embeddings with Spark NLP
`storage_ref`	binding to NerDLModel trained by that embeddings
`dimension`	number of word embeddings dimensions
`case_sensitive`	whether to ignore case in tokens for embeddings matching
`lazy_annotator`	boolean for laziness
`read_cache_size`	size for the read cache
`write_buffer_size`	size for the write cache
`include_storage`	whether or not to include word embeddings when saving this annotator to disk (single or within pipeline)
`uid`	A character string used to uniquely identify the ML estimator.

Value

The object returned depends on the class of x.

spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Estimator object and can be used to compose Pipeline objects.
ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP estimator appended to the pipeline.
tbl_spark: When x is a tbl_spark, an estimator is constructed then immediately fit with the input tbl_spark, returning an NLP model.

When x is a spark_connection the function returns a WordEmbeddings estimator. When x is a ml_pipeline the pipeline with the WordEmbeddings added. When x is a tbl_spark a transformed tbl_spark (note that the Dataframe passed in must have the input_cols specified).