nlp_chunk_entity_resolver: Spark NLP ChunkEntityResolverApproach

View source: R/chunk_entity_resolver.R

nlp_chunk_entity_resolverR Documentation

Spark NLP ChunkEntityResolverApproach

Description

Spark ML estimator that See https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#chunkentityresolver

Usage

nlp_chunk_entity_resolver(
  x,
  input_cols,
  output_col,
  all_distances_metadata = NULL,
  alternatives = NULL,
  case_sensitive = NULL,
  confidence_function = NULL,
  distance_function = NULL,
  distance_weights = NULL,
  enable_jaccard = NULL,
  enable_jaro_winkler = NULL,
  enable_levenshtein = NULL,
  enable_sorensen_dice = NULL,
  enable_tfidf = NULL,
  enable_wmd = NULL,
  extra_mass_penalty = NULL,
  label_column = NULL,
  miss_as_empty = NULL,
  neighbors = NULL,
  normalized_col = NULL,
  pooling_strategy = NULL,
  threshold = NULL,
  uid = random_string("chunk_entity_resolver_")
)

Arguments

x

A spark_connection, ml_pipeline, or a tbl_spark.

input_cols

Input columns. String array.

output_col

Output column. String.

all_distances_metadata

whether or not to return an all distance values in the metadata.

alternatives

number of results to return in the metadata after sorting by last distance calculated

case_sensitive

whether to treat the entities as case sensitive

confidence_function

what function to use to calculate confidence: INVERSE or SOFTMAX

distance_function

what distance function to use for KNN: 'EUCLIDEAN' or 'COSINE'

distance_weights

distance weights to apply before pooling: (WMD, TFIDF, Jaccard, SorensenDice, JaroWinkler, Levenshtein)

enable_jaccard

whether or not to use Jaccard token distance.

enable_jaro_winkler

whether or not to use Jaro-Winkler character distance.

enable_levenshtein

whether or not to use Levenshtein character distance.

enable_sorensen_dice

whether or not to use Sorensen-Dice token distance.

enable_tfidf

whether or not to use TFIDF token distance.

enable_wmd

whether or not to use WMD token distance.

extra_mass_penalty

penalty for extra words in the knowledge base match during WMD calculation

label_column

column name for the value we are trying to resolve

miss_as_empty

whether or not to return an empty annotation on unmatched chunks

neighbors

number of neighbours to consider in the KNN query to calculate WMD

normalized_col

column name for the original, normalized description

pooling_strategy

pooling strategy to aggregate distances: AVERAGE or SUM

threshold

threshold value for the aggregated distance

uid

A character string used to uniquely identify the ML estimator.

Value

The object returned depends on the class of x.

  • spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Estimator object and can be used to compose Pipeline objects.

  • ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP estimator appended to the pipeline.

  • tbl_spark: When x is a tbl_spark, an estimator is constructed then immediately fit with the input tbl_spark, returning an NLP model.


r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.