nlp_pos: Read a part of speech tagging training file into a dataset

View source: R/perceptron.R

nlp_posR Documentation

Read a part of speech tagging training file into a dataset

Description

In order to train a Part of Speech Tagger annotator, we need to get corpus data as a spark dataframe. This function does this: it reads a plain text file and transforms it to a spark dataset that is ready for training a POS tagger. See the Scala API docs for the default parameter values ( https://nlp.johnsnowlabs.com/api/index.html#com.johnsnowlabs.nlp.training.POS)

Usage

nlp_pos(
  sc,
  file_path,
  delimiter = NULL,
  output_pos_col = NULL,
  output_document_col = NULL,
  output_text_col = NULL
)

Arguments

sc

Spark connection

file_path

path to the text file with the training data

delimiter

the delimiter used in the training data

output_pos_col

the pos column name for the output data frame

output_document_col

the document column name for the output data frame

output_text_col

the text column name for the output data frame

Value

Spark dataframe containing the data


r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.