nlp_sentence_detector: Spark NLP SentenceDetector - sentence boundary detector
In r-spark/sparknlp: R Interface to John Snow Labs Spark NLP

nlp_sentence_detector

R Documentation

Spark NLP SentenceDetector - sentence boundary detector

Description

Spark ML Transformer that finds sentence bounds in raw text. Applies rule from Pragmatic Segmenter See https://nlp.johnsnowlabs.com/docs/en/annotators#sentencedetector

Usage

nlp_sentence_detector(
  x,
  input_cols,
  output_col,
  custom_bounds = NULL,
  use_custom_only = NULL,
  use_abbreviations = NULL,
  explode_sentences = NULL,
  detect_lists = NULL,
  min_length = NULL,
  max_length = NULL,
  split_length = NULL,
  uid = random_string("sentence_detector_")
)

Arguments

`x`	A `spark_connection`, `ml_pipeline`, or a `tbl_spark`.
`input_cols`	Input columns. String array.
`output_col`	Output column. String.
`custom_bounds`	Custom sentence separator text. Optional.
`use_custom_only`	Use only custom bounds without considering those of Pragmatic Segmenter. Defaults to false. Needs customBounds.
`use_abbreviations`	Whether to consider abbreviation strategies for better accuracy but slower performance. Defaults to true.
`explode_sentences`	Whether to split sentences into different Dataset rows. Useful for higher parallelism in fat rows. Defaults to false.
`detect_lists`	whether to take lists into consideration at sentence detection
`min_length`	set the minimum allowed length for each sentence
`max_length`	set the maximum allowed length for each sentence
`split_length`	length at which sentences will be forcibly set
`uid`	A character string used to uniquely identify the ML estimator.

Value

The object returned depends on the class of x.

spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to a Spark Estimator object and can be used to compose Pipeline objects.
ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with the NLP estimator appended to the pipeline.
tbl_spark: When x is a tbl_spark, an estimator is constructed then immediately fit with the input tbl_spark, returning an NLP model.