nlp_ner_crf | R Documentation |
Spark ML estimator that allows for a generic model to be trained by utilizing a CRF machine learning algorithm. Its train data (train_ner) is either a labeled or an external CoNLL 2003 IOB based spark dataset with Annotations columns. Also the user has to provide word embeddings annotation column. Optionally the user can provide an entity dictionary file for better accuracy. See https://nlp.johnsnowlabs.com/docs/en/annotators#ner-crf
nlp_ner_crf( x, input_cols, output_col, label_col = NULL, min_epochs = NULL, max_epochs = NULL, l2 = NULL, C0 = NULL, loss_eps = NULL, min_w = NULL, external_features_path = NULL, external_features_delimiter = NULL, external_features_read_as = "LINE_BY_LINE", external_features_options = list(format = "text"), entities = NULL, verbose = NULL, random_seed = NULL, uid = random_string("ner_crf_") )
x |
A |
input_cols |
Input columns. String array. |
output_col |
Output column. String. |
label_col |
If DatasetPath is not provided, this sequence of Annotation type of column should have labeled data per token |
min_epochs |
Minimum number of epochs to train |
max_epochs |
Maximum number of epochs to train |
l2 |
L2 regularization coefficient for CRF |
C0 |
c0 defines decay speed for gradient |
loss_eps |
If epoch relative improvement lass than this value, training is stopped |
min_w |
Features with less weights than this value will be filtered out |
external_features_path |
Path to file or folder of line separated file |
external_features_delimiter |
something like this: Volvo:ORG with such delimiter |
external_features_read_as |
readAs LINE_BY_LINE or SPARK_DATASET |
external_features_options |
named list of options passed to the latter. |
entities |
Array of entities to recognize |
verbose |
Verbosity level |
random_seed |
random seed |
uid |
A character string used to uniquely identify the ML estimator. |
The object returned depends on the class of x
.
spark_connection
: When x
is a spark_connection
, the function returns an instance of a ml_estimator
object. The object contains a pointer to
a Spark Estimator
object and can be used to compose
Pipeline
objects.
ml_pipeline
: When x
is a ml_pipeline
, the function returns a ml_pipeline
with
the NLP estimator appended to the pipeline.
tbl_spark
: When x
is a tbl_spark
, an estimator is constructed then
immediately fit with the input tbl_spark
, returning an NLP model.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.