nlp_ner_crf: Spark NLP NerCrfApproach
In r-spark/sparknlp: R Interface to John Snow Labs Spark NLP

nlp_ner_crf

R Documentation

Spark NLP NerCrfApproach

Description

Spark ML estimator that allows for a generic model to be trained by utilizing a CRF machine learning algorithm. Its train data (train_ner) is either a labeled or an external CoNLL 2003 IOB based spark dataset with Annotations columns. Also the user has to provide word embeddings annotation column. Optionally the user can provide an entity dictionary file for better accuracy. See https://nlp.johnsnowlabs.com/docs/en/annotators#ner-crf

Usage

nlp_ner_crf(
  x,
  input_cols,
  output_col,
  label_col = NULL,
  min_epochs = NULL,
  max_epochs = NULL,
  l2 = NULL,
  C0 = NULL,
  loss_eps = NULL,
  min_w = NULL,
  external_features_path = NULL,
  external_features_delimiter = NULL,
  external_features_read_as = "LINE_BY_LINE",
  external_features_options = list(format = "text"),
  entities = NULL,
  verbose = NULL,
  random_seed = NULL,
  uid = random_string("ner_crf_")
)

Arguments

`x`	A `spark_connection`, `ml_pipeline`, or a `tbl_spark`.
`input_cols`	Input columns. String array.
`output_col`	Output column. String.
`label_col`	If DatasetPath is not provided, this sequence of Annotation type of column should have labeled data per token
`min_epochs`	Minimum number of epochs to train
`max_epochs`	Maximum number of epochs to train
`l2`	L2 regularization coefficient for CRF
`C0`	c0 defines decay speed for gradient
`loss_eps`	If epoch relative improvement lass than this value, training is stopped
`min_w`	Features with less weights than this value will be filtered out
`external_features_path`	Path to file or folder of line separated file
`external_features_delimiter`	something like this: Volvo:ORG with such delimiter
`external_features_read_as`	readAs LINE_BY_LINE or SPARK_DATASET
`external_features_options`	named list of options passed to the latter.
`entities`	Array of entities to recognize
`verbose`	Verbosity level
`random_seed`	random seed
`uid`	A character string used to uniquely identify the ML estimator.

Value

The object returned depends on the class of x.

spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Estimator object and can be used to compose Pipeline objects.
ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP estimator appended to the pipeline.
tbl_spark: When x is a tbl_spark, an estimator is constructed then immediately fit with the input tbl_spark, returning an NLP model.