View source: R/chunk_entity_resolver.R
nlp_chunk_entity_resolver | R Documentation |
Spark ML estimator that See https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#chunkentityresolver
nlp_chunk_entity_resolver( x, input_cols, output_col, all_distances_metadata = NULL, alternatives = NULL, case_sensitive = NULL, confidence_function = NULL, distance_function = NULL, distance_weights = NULL, enable_jaccard = NULL, enable_jaro_winkler = NULL, enable_levenshtein = NULL, enable_sorensen_dice = NULL, enable_tfidf = NULL, enable_wmd = NULL, extra_mass_penalty = NULL, label_column = NULL, miss_as_empty = NULL, neighbors = NULL, normalized_col = NULL, pooling_strategy = NULL, threshold = NULL, uid = random_string("chunk_entity_resolver_") )
x |
A |
input_cols |
Input columns. String array. |
output_col |
Output column. String. |
all_distances_metadata |
whether or not to return an all distance values in the metadata. |
alternatives |
number of results to return in the metadata after sorting by last distance calculated |
case_sensitive |
whether to treat the entities as case sensitive |
confidence_function |
what function to use to calculate confidence: INVERSE or SOFTMAX |
distance_function |
what distance function to use for KNN: 'EUCLIDEAN' or 'COSINE' |
distance_weights |
distance weights to apply before pooling: (WMD, TFIDF, Jaccard, SorensenDice, JaroWinkler, Levenshtein) |
enable_jaccard |
whether or not to use Jaccard token distance. |
enable_jaro_winkler |
whether or not to use Jaro-Winkler character distance. |
enable_levenshtein |
whether or not to use Levenshtein character distance. |
enable_sorensen_dice |
whether or not to use Sorensen-Dice token distance. |
enable_tfidf |
whether or not to use TFIDF token distance. |
enable_wmd |
whether or not to use WMD token distance. |
extra_mass_penalty |
penalty for extra words in the knowledge base match during WMD calculation |
label_column |
column name for the value we are trying to resolve |
miss_as_empty |
whether or not to return an empty annotation on unmatched chunks |
neighbors |
number of neighbours to consider in the KNN query to calculate WMD |
normalized_col |
column name for the original, normalized description |
pooling_strategy |
pooling strategy to aggregate distances: AVERAGE or SUM |
threshold |
threshold value for the aggregated distance |
uid |
A character string used to uniquely identify the ML estimator. |
The object returned depends on the class of x
.
spark_connection
: When x
is a spark_connection
, the function returns an instance of a ml_estimator
object. The object contains a pointer to
a Spark Estimator
object and can be used to compose
Pipeline
objects.
ml_pipeline
: When x
is a ml_pipeline
, the function returns a ml_pipeline
with
the NLP estimator appended to the pipeline.
tbl_spark
: When x
is a tbl_spark
, an estimator is constructed then
immediately fit with the input tbl_spark
, returning an NLP model.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.