nlp_ner_converter_internal: Spark NLP NerConverterInternal
In r-spark/sparknlp: R Interface to John Snow Labs Spark NLP

nlp_ner_converter_internal

R Documentation

Spark NLP NerConverterInternal

Description

Spark ML transformer that converts IOB or IOB2 representation of NER to user-friendly.

Usage

nlp_ner_converter_internal(
  x,
  input_cols,
  output_col,
  white_list = NULL,
  black_list = NULL,
  preserve_position = NULL,
  lazy_annotator = NULL,
  greedy_mode = NULL,
  threshold = NULL,
  uid = random_string("ner_converter_internal_")
)

Arguments

`x`	A `spark_connection`, `ml_pipeline`, or a `tbl_spark`.
`input_cols`	Input columns. String array.
`output_col`	Output column. String.
`white_list`	If defined, list of entities to process. The rest will be ignored. Do not include IOB prefix on labels"
`black_list`	If defined, list of entities to ignore
`preserve_position`	Whether to preserve the original position of the tokens in the original document or use the modified tokens
`lazy_annotator`	allows annotators to stand idle in the Pipeline and do nothing. Can be called by other Annotators in a RecursivePipeline
`greedy_mode`	whether to ignore B tags for contiguous tokens of same entity same
`threshold`	set the confidence threshold
`uid`	A character string used to uniquely identify the ML transformer.

Value

The object returned depends on the class of x.

spark_connection: When x is a spark_connection, the function returns an instance of a ml_transformer object. The object contains a pointer to a Spark Transformer object and can be used to compose Pipeline objects.
ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP transformer/annotator appended to the pipeline.
tbl_spark: When x is a tbl_spark, a transformer is constructed then immediately fit with the input tbl_spark, returning the transformed data frame.

When x is a spark_connection the function returns a NerConverter transformer. When x is a ml_pipeline the pipeline with the NerConverter added. When x is a tbl_spark a transformed tbl_spark (note that the Dataframe passed in must have the input_cols specified).