nlp_norvig_spell_checker: Spark NLP NorvigSweetingApproach

View source: R/norvig-spell-checker.R

nlp_norvig_spell_checkerR Documentation

Spark NLP NorvigSweetingApproach

Description

Spark ML estimator that retrieves tokens and makes corrections automatically if not found in an English dictionary See https://nlp.johnsnowlabs.com/docs/en/annotators#norvig-spellchecker

Usage

nlp_norvig_spell_checker(
  x,
  input_cols,
  output_col,
  dictionary_path = NULL,
  dictionary_token_pattern = "\\S+",
  dictionary_read_as = "TEXT",
  dictionary_options = list(format = "text"),
  case_sensitive = NULL,
  double_variants = NULL,
  short_circuit = NULL,
  word_size_ignore = NULL,
  dups_limit = NULL,
  reduct_limit = NULL,
  intersections = NULL,
  vowel_swap_limit = NULL,
  uid = random_string("norvig_spell_checker_")
)

Arguments

x

A spark_connection, ml_pipeline, or a tbl_spark.

input_cols

Input columns. String array.

output_col

Output column. String.

dictionary_path

path to file with properly spelled words

dictionary_token_pattern

tokenPattern is the regex pattern to identify them in text,

dictionary_read_as

TEXT or SPARK_DATASET

dictionary_options

options passed to Spark reader

case_sensitive

defaults to false. Might affect accuracy

double_variants

enables extra check for word combinations, more accuracy at performance

short_circuit

faster but less accurate mode

word_size_ignore

Minimum size of word before moving on. Defaults to 3.

dups_limit

Maximum duplicate of characters to account for. Defaults to 2.

reduct_limit

Word reduction limit. Defaults to 3

intersections

Hamming intersections to attempt. Defaults to 10.

vowel_swap_limit

Vowel swap attempts. Defaults to 6.

uid

A character string used to uniquely identify the ML estimator.

Value

The object returned depends on the class of x.

  • spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Estimator object and can be used to compose Pipeline objects.

  • ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP estimator appended to the pipeline.

  • tbl_spark: When x is a tbl_spark, an estimator is constructed then immediately fit with the input tbl_spark, returning an NLP model.


r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.