nlp_graph_extraction: Spark NLP GraphExtraction

View source: R/graph-extraction.R

nlp_graph_extractionR Documentation

Spark NLP GraphExtraction

Description

Spark ML transformer that Extracts a dependency graph between entities. The GraphExtraction class takes e.g. extracted entities from a NerDLModel and creates a dependency tree which describes how the entities relate to each other. For that a triple store format is used. Nodes represent the entities and the edges represent the relations between those entities. The graph can then be used to find relevant relationships between words.

Usage

nlp_graph_extraction(
  x,
  input_cols,
  output_col,
  delimiter = NULL,
  dependency_parser_model = NULL,
  entity_types = NULL,
  explode_entities = NULL,
  include_edges = NULL,
  max_sentence_size = NULL,
  merge_entities = NULL,
  merge_entities_iob_format = NULL,
  min_sentence_size = NULL,
  pos_model = NULL,
  relationship_types = NULL,
  root_tokens = NULL,
  typed_dependency_parser_model = NULL,
  uid = random_string("graph_extraction_")
)

Arguments

x

A spark_connection, ml_pipeline, or a tbl_spark.

input_cols

Input columns. String array.

output_col

Output column. String.

delimiter

Delimiter symbol used for path output (Default: ",")

dependency_parser_model

Coordinates (name, lang, remoteLoc) to a pretrained Dependency Parser model (Default: Array())

entity_types

Find paths between a pair of entities (Default: Array())

explode_entities

When set to true find paths between entities (Default: false)

include_edges

Whether to include edges when building paths (Default: true)

max_sentence_size

Maximum sentence size that the annotator will process (Default: 1000).

merge_entities

Merge same neighboring entities as a single token (Default: false)

merge_entities_iob_format

IOB format to apply when merging entities

min_sentence_size

Minimum sentence size that the annotator will process (Default: 2).

pos_model

Coordinates (name, lang, remoteLoc) to a pretrained POS model (Default: Array())

relationship_types

Find paths between a pair of token and entity (Default: Array())

root_tokens

Tokens to be consider as root to start traversing the paths (Default: Array()).

typed_dependency_parser_model

Coordinates (name, lang, remoteLoc) to a pretrained Typed Dependency Parser model (Default: Array())

uid

A character string used to uniquely identify the ML estimator.

Details

Both the DependencyParserModel and TypedDependencyParserModel need to be present in the pipeline. There are two ways to set them:

Both Annotators are present in the pipeline already. The dependencies are taken implicitly from these two Annotators.
Setting setMergeEntities to true will download the default pretrained models for those two Annotators automatically.
 The specific models can also be set with setDependencyParserModel and setTypedDependencyParserModel:

See https://nlp.johnsnowlabs.com/docs/en/annotators#graphextraction

Value

The object returned depends on the class of x.

  • spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Estimator object and can be used to compose Pipeline objects.

  • ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP estimator appended to the pipeline.

  • tbl_spark: When x is a tbl_spark, an estimator is constructed then immediately fit with the input tbl_spark, returning an NLP model.


r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.