nlp_pos: Read a part of speech tagging training file into a dataset
In r-spark/sparknlp: R Interface to John Snow Labs Spark NLP

nlp_pos

R Documentation

Read a part of speech tagging training file into a dataset

Description

In order to train a Part of Speech Tagger annotator, we need to get corpus data as a spark dataframe. This function does this: it reads a plain text file and transforms it to a spark dataset that is ready for training a POS tagger. See the Scala API docs for the default parameter values ( https://nlp.johnsnowlabs.com/api/index.html#com.johnsnowlabs.nlp.training.POS)

Usage

nlp_pos(
  sc,
  file_path,
  delimiter = NULL,
  output_pos_col = NULL,
  output_document_col = NULL,
  output_text_col = NULL
)

Arguments

`sc`	Spark connection
`file_path`	path to the text file with the training data
`delimiter`	the delimiter used in the training data
`output_pos_col`	the pos column name for the output data frame
`output_document_col`	the document column name for the output data frame
`output_text_col`	the text column name for the output data frame