nlp_chunk_entity_resolver_pretrained: Load a pretrained Spark NLP Chunk Entity Resolver model

View source: R/chunk_entity_resolver.R

nlp_chunk_entity_resolver_pretrainedR Documentation

Load a pretrained Spark NLP Chunk Entity Resolver model

Description

Create a pretrained Spark NLP ChunkEntityResolverModel model

Usage

nlp_chunk_entity_resolver_pretrained(
  sc,
  input_cols,
  output_col,
  all_distances_metadata = NULL,
  alternatives = NULL,
  case_sensitive = NULL,
  confidence_function = NULL,
  distance_function = NULL,
  distance_weights = NULL,
  enable_jaccard = NULL,
  enable_jaro_winkler = NULL,
  enable_levenshtein = NULL,
  enable_sorensen_dice = NULL,
  enable_tfidf = NULL,
  enable_wmd = NULL,
  extra_mass_penalty = NULL,
  miss_as_empty = NULL,
  neighbors = NULL,
  pooling_strategy = NULL,
  threshold = NULL,
  name,
  lang = NULL,
  remote_loc = NULL
)

Arguments

sc

A Spark connection

input_cols

Input columns. String array.

output_col

Output column. String.

all_distances_metadata

whether or not to return an all distance values in the metadata.

alternatives

number of results to return in the metadata after sorting by last distance calculated

case_sensitive

whether to treat the entities as case sensitive

confidence_function

what function to use to calculate confidence: INVERSE or SOFTMAX

distance_function

what distance function to use for KNN: 'EUCLIDEAN' or 'COSINE'

distance_weights

distance weights to apply before pooling: (WMD, TFIDF, Jaccard, SorensenDice, JaroWinkler, Levenshtein)

enable_jaccard

whether or not to use Jaccard token distance.

enable_jaro_winkler

whether or not to use Jaro-Winkler character distance.

enable_levenshtein

whether or not to use Levenshtein character distance.

enable_sorensen_dice

whether or not to use Sorensen-Dice token distance.

enable_tfidf

whether or not to use TFIDF token distance.

enable_wmd

whether or not to use WMD token distance.

extra_mass_penalty

penalty for extra words in the knowledge base match during WMD calculation

miss_as_empty

whether or not to return an empty annotation on unmatched chunks

neighbors

number of neighbours to consider in the KNN query to calculate WMD

pooling_strategy

pooling strategy to aggregate distances: AVERAGE or SUM

threshold

threshold value for the aggregated distance#'

name

the name of the model to load. If NULL will use the default value

lang

the language of the model to be loaded. If NULL will use the default value

remote_loc

the remote location of the model. If NULL will use the default value

Value

The Spark NLP model with the pretrained model loaded


r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.