nlp_ner_crf: Spark NLP NerCrfApproach

View source: R/ner-crf.R

nlp_ner_crfR Documentation

Spark NLP NerCrfApproach

Description

Spark ML estimator that allows for a generic model to be trained by utilizing a CRF machine learning algorithm. Its train data (train_ner) is either a labeled or an external CoNLL 2003 IOB based spark dataset with Annotations columns. Also the user has to provide word embeddings annotation column. Optionally the user can provide an entity dictionary file for better accuracy. See https://nlp.johnsnowlabs.com/docs/en/annotators#ner-crf

Usage

nlp_ner_crf(
  x,
  input_cols,
  output_col,
  label_col = NULL,
  min_epochs = NULL,
  max_epochs = NULL,
  l2 = NULL,
  C0 = NULL,
  loss_eps = NULL,
  min_w = NULL,
  external_features_path = NULL,
  external_features_delimiter = NULL,
  external_features_read_as = "LINE_BY_LINE",
  external_features_options = list(format = "text"),
  entities = NULL,
  verbose = NULL,
  random_seed = NULL,
  uid = random_string("ner_crf_")
)

Arguments

x

A spark_connection, ml_pipeline, or a tbl_spark.

input_cols

Input columns. String array.

output_col

Output column. String.

label_col

If DatasetPath is not provided, this sequence of Annotation type of column should have labeled data per token

min_epochs

Minimum number of epochs to train

max_epochs

Maximum number of epochs to train

l2

L2 regularization coefficient for CRF

C0

c0 defines decay speed for gradient

loss_eps

If epoch relative improvement lass than this value, training is stopped

min_w

Features with less weights than this value will be filtered out

external_features_path

Path to file or folder of line separated file

external_features_delimiter

something like this: Volvo:ORG with such delimiter

external_features_read_as

readAs LINE_BY_LINE or SPARK_DATASET

external_features_options

named list of options passed to the latter.

entities

Array of entities to recognize

verbose

Verbosity level

random_seed

random seed

uid

A character string used to uniquely identify the ML estimator.

Value

The object returned depends on the class of x.

  • spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Estimator object and can be used to compose Pipeline objects.

  • ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP estimator appended to the pipeline.

  • tbl_spark: When x is a tbl_spark, an estimator is constructed then immediately fit with the input tbl_spark, returning an NLP model.


r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.