nlp_ner_converter_internal: Spark NLP NerConverterInternal

View source: R/ner-converter-internal.R

nlp_ner_converter_internalR Documentation

Spark NLP NerConverterInternal

Description

Spark ML transformer that converts IOB or IOB2 representation of NER to user-friendly.

Usage

nlp_ner_converter_internal(
  x,
  input_cols,
  output_col,
  white_list = NULL,
  black_list = NULL,
  preserve_position = NULL,
  lazy_annotator = NULL,
  greedy_mode = NULL,
  threshold = NULL,
  uid = random_string("ner_converter_internal_")
)

Arguments

x

A spark_connection, ml_pipeline, or a tbl_spark.

input_cols

Input columns. String array.

output_col

Output column. String.

white_list

If defined, list of entities to process. The rest will be ignored. Do not include IOB prefix on labels"

black_list

If defined, list of entities to ignore

preserve_position

Whether to preserve the original position of the tokens in the original document or use the modified tokens

lazy_annotator

allows annotators to stand idle in the Pipeline and do nothing. Can be called by other Annotators in a RecursivePipeline

greedy_mode

whether to ignore B tags for contiguous tokens of same entity same

threshold

set the confidence threshold

uid

A character string used to uniquely identify the ML transformer.

Value

The object returned depends on the class of x.

  • spark_connection: When x is a spark_connection, the function returns an instance of a ml_transformer object. The object contains a pointer to a Spark Transformer object and can be used to compose Pipeline objects.

  • ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP transformer/annotator appended to the pipeline.

  • tbl_spark: When x is a tbl_spark, a transformer is constructed then immediately fit with the input tbl_spark, returning the transformed data frame.

When x is a spark_connection the function returns a NerConverter transformer. When x is a ml_pipeline the pipeline with the NerConverter added. When x is a tbl_spark a transformed tbl_spark (note that the Dataframe passed in must have the input_cols specified).


r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.