nlp_contextual_parser: Spark NLP ContextualParserApproach

View source: R/contextual_parser.R

nlp_contextual_parserR Documentation

Spark NLP ContextualParserApproach

Description

Spark ML estimator that provides Regex + Contextual matching based on a JSON file See https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#contextual-parser

Usage

nlp_contextual_parser(
  x,
  input_cols,
  output_col,
  json_path = NULL,
  dictionary = NULL,
  read_as = "TEXT",
  options = NULL,
  case_sensitive = NULL,
  prefix_and_suffix_match = NULL,
  context_match = NULL,
  update_tokenizer = NULL,
  uid = random_string("contextual_parser_")
)

Arguments

x

A spark_connection, ml_pipeline, or a tbl_spark.

input_cols

Input columns. String array.

output_col

Output column. String.

json_path

path to json file with rules

dictionary

path to dictionary file in tsv or csv format

read_as

the format of the file, can be one of TEXT, SPARK, BINARY.

options

an named list containing additional parameters used when reading the dictionary file

case_sensitive

whether to use case sensitive when matching values

prefix_and_suffix_match

whether to force both before AND after the regex match to annotate the hit

context_match

whether to include prior and next context to annotate the hit

update_tokenizer

Whether to update tokenizer from pipeline when detecting multiple words on dictionary values

uid

A character string used to uniquely identify the ML estimator.

Value

The object returned depends on the class of x.

  • spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Estimator object and can be used to compose Pipeline objects.

  • ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP estimator appended to the pipeline.

  • tbl_spark: When x is a tbl_spark, an estimator is constructed then immediately fit with the input tbl_spark, returning an NLP model.


r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.