nlp_chunk_entity_resolver: Spark NLP ChunkEntityResolverApproach
In r-spark/sparknlp: R Interface to John Snow Labs Spark NLP

nlp_chunk_entity_resolver

R Documentation

Spark NLP ChunkEntityResolverApproach

Description

Spark ML estimator that See https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#chunkentityresolver

Usage

nlp_chunk_entity_resolver(
  x,
  input_cols,
  output_col,
  all_distances_metadata = NULL,
  alternatives = NULL,
  case_sensitive = NULL,
  confidence_function = NULL,
  distance_function = NULL,
  distance_weights = NULL,
  enable_jaccard = NULL,
  enable_jaro_winkler = NULL,
  enable_levenshtein = NULL,
  enable_sorensen_dice = NULL,
  enable_tfidf = NULL,
  enable_wmd = NULL,
  extra_mass_penalty = NULL,
  label_column = NULL,
  miss_as_empty = NULL,
  neighbors = NULL,
  normalized_col = NULL,
  pooling_strategy = NULL,
  threshold = NULL,
  uid = random_string("chunk_entity_resolver_")
)

Arguments

`x`	A `spark_connection`, `ml_pipeline`, or a `tbl_spark`.
`input_cols`	Input columns. String array.
`output_col`	Output column. String.
`all_distances_metadata`	whether or not to return an all distance values in the metadata.
`alternatives`	number of results to return in the metadata after sorting by last distance calculated
`case_sensitive`	whether to treat the entities as case sensitive
`confidence_function`	what function to use to calculate confidence: INVERSE or SOFTMAX
`distance_function`	what distance function to use for KNN: 'EUCLIDEAN' or 'COSINE'
`distance_weights`	distance weights to apply before pooling: (WMD, TFIDF, Jaccard, SorensenDice, JaroWinkler, Levenshtein)
`enable_jaccard`	whether or not to use Jaccard token distance.
`enable_jaro_winkler`	whether or not to use Jaro-Winkler character distance.
`enable_levenshtein`	whether or not to use Levenshtein character distance.
`enable_sorensen_dice`	whether or not to use Sorensen-Dice token distance.
`enable_tfidf`	whether or not to use TFIDF token distance.
`enable_wmd`	whether or not to use WMD token distance.
`extra_mass_penalty`	penalty for extra words in the knowledge base match during WMD calculation
`label_column`	column name for the value we are trying to resolve
`miss_as_empty`	whether or not to return an empty annotation on unmatched chunks
`neighbors`	number of neighbours to consider in the KNN query to calculate WMD
`normalized_col`	column name for the original, normalized description
`pooling_strategy`	pooling strategy to aggregate distances: AVERAGE or SUM
`threshold`	threshold value for the aggregated distance
`uid`	A character string used to uniquely identify the ML estimator.

Value

The object returned depends on the class of x.

spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Estimator object and can be used to compose Pipeline objects.
ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP estimator appended to the pipeline.
tbl_spark: When x is a tbl_spark, an estimator is constructed then immediately fit with the input tbl_spark, returning an NLP model.