View source: R/chunk_entity_resolver.R
| nlp_chunk_entity_resolver | R Documentation |
Spark ML estimator that See https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#chunkentityresolver
nlp_chunk_entity_resolver(
x,
input_cols,
output_col,
all_distances_metadata = NULL,
alternatives = NULL,
case_sensitive = NULL,
confidence_function = NULL,
distance_function = NULL,
distance_weights = NULL,
enable_jaccard = NULL,
enable_jaro_winkler = NULL,
enable_levenshtein = NULL,
enable_sorensen_dice = NULL,
enable_tfidf = NULL,
enable_wmd = NULL,
extra_mass_penalty = NULL,
label_column = NULL,
miss_as_empty = NULL,
neighbors = NULL,
normalized_col = NULL,
pooling_strategy = NULL,
threshold = NULL,
uid = random_string("chunk_entity_resolver_")
)
x |
A |
input_cols |
Input columns. String array. |
output_col |
Output column. String. |
all_distances_metadata |
whether or not to return an all distance values in the metadata. |
alternatives |
number of results to return in the metadata after sorting by last distance calculated |
case_sensitive |
whether to treat the entities as case sensitive |
confidence_function |
what function to use to calculate confidence: INVERSE or SOFTMAX |
distance_function |
what distance function to use for KNN: 'EUCLIDEAN' or 'COSINE' |
distance_weights |
distance weights to apply before pooling: (WMD, TFIDF, Jaccard, SorensenDice, JaroWinkler, Levenshtein) |
enable_jaccard |
whether or not to use Jaccard token distance. |
enable_jaro_winkler |
whether or not to use Jaro-Winkler character distance. |
enable_levenshtein |
whether or not to use Levenshtein character distance. |
enable_sorensen_dice |
whether or not to use Sorensen-Dice token distance. |
enable_tfidf |
whether or not to use TFIDF token distance. |
enable_wmd |
whether or not to use WMD token distance. |
extra_mass_penalty |
penalty for extra words in the knowledge base match during WMD calculation |
label_column |
column name for the value we are trying to resolve |
miss_as_empty |
whether or not to return an empty annotation on unmatched chunks |
neighbors |
number of neighbours to consider in the KNN query to calculate WMD |
normalized_col |
column name for the original, normalized description |
pooling_strategy |
pooling strategy to aggregate distances: AVERAGE or SUM |
threshold |
threshold value for the aggregated distance |
uid |
A character string used to uniquely identify the ML estimator. |
The object returned depends on the class of x.
spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to
a Spark Estimator object and can be used to compose
Pipeline objects.
ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with
the NLP estimator appended to the pipeline.
tbl_spark: When x is a tbl_spark, an estimator is constructed then
immediately fit with the input tbl_spark, returning an NLP model.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.