nlp_graph_extraction: Spark NLP GraphExtraction
In r-spark/sparknlp: R Interface to John Snow Labs Spark NLP

nlp_graph_extraction

R Documentation

Spark NLP GraphExtraction

Description

Spark ML transformer that Extracts a dependency graph between entities. The GraphExtraction class takes e.g. extracted entities from a NerDLModel and creates a dependency tree which describes how the entities relate to each other. For that a triple store format is used. Nodes represent the entities and the edges represent the relations between those entities. The graph can then be used to find relevant relationships between words.

Usage

nlp_graph_extraction(
  x,
  input_cols,
  output_col,
  delimiter = NULL,
  dependency_parser_model = NULL,
  entity_types = NULL,
  explode_entities = NULL,
  include_edges = NULL,
  max_sentence_size = NULL,
  merge_entities = NULL,
  merge_entities_iob_format = NULL,
  min_sentence_size = NULL,
  pos_model = NULL,
  relationship_types = NULL,
  root_tokens = NULL,
  typed_dependency_parser_model = NULL,
  uid = random_string("graph_extraction_")
)

Arguments

`x`	A `spark_connection`, `ml_pipeline`, or a `tbl_spark`.
`input_cols`	Input columns. String array.
`output_col`	Output column. String.
`delimiter`	Delimiter symbol used for path output (Default: ",")
`dependency_parser_model`	Coordinates (name, lang, remoteLoc) to a pretrained Dependency Parser model (Default: Array())
`entity_types`	Find paths between a pair of entities (Default: Array())
`explode_entities`	When set to true find paths between entities (Default: false)
`include_edges`	Whether to include edges when building paths (Default: true)
`max_sentence_size`	Maximum sentence size that the annotator will process (Default: 1000).
`merge_entities`	Merge same neighboring entities as a single token (Default: false)
`merge_entities_iob_format`	IOB format to apply when merging entities
`min_sentence_size`	Minimum sentence size that the annotator will process (Default: 2).
`pos_model`	Coordinates (name, lang, remoteLoc) to a pretrained POS model (Default: Array())
`relationship_types`	Find paths between a pair of token and entity (Default: Array())
`root_tokens`	Tokens to be consider as root to start traversing the paths (Default: Array()).
`typed_dependency_parser_model`	Coordinates (name, lang, remoteLoc) to a pretrained Typed Dependency Parser model (Default: Array())
`uid`	A character string used to uniquely identify the ML estimator.

Details

Both the DependencyParserModel and TypedDependencyParserModel need to be present in the pipeline. There are two ways to set them:

Both Annotators are present in the pipeline already. The dependencies are taken implicitly from these two Annotators.
Setting setMergeEntities to true will download the default pretrained models for those two Annotators automatically.
 The specific models can also be set with setDependencyParserModel and setTypedDependencyParserModel:

See https://nlp.johnsnowlabs.com/docs/en/annotators#graphextraction

Value

The object returned depends on the class of x.

spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Estimator object and can be used to compose Pipeline objects.
ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP estimator appended to the pipeline.
tbl_spark: When x is a tbl_spark, an estimator is constructed then immediately fit with the input tbl_spark, returning an NLP model.