nlp_conllu_read_dataset: Transform CoNLLU format text file to Spark dataframe

nlp_conllu_read_datasetR Documentation

Transform CoNLLU format text file to Spark dataframe

Description

In order to train a Lemmatizer annotator, we need to get CoNLLU format data as a spark dataframe. There is a component that does this for us: it reads a plain text file and transforms it to a spark dataset. See https://nlp.johnsnowlabs.com/docs/en/annotators#conllu-dataset. All the function arguments have defaults. See https://nlp.johnsnowlabs.com/api/index.html#com.johnsnowlabs.nlp.training.CoNLLU for the defaults.

Usage

nlp_conllu_read_dataset(sc, path, read_as = NULL, explode_sentences = NULL)

Arguments

sc

a Spark connection

path

path to the file to read

read_as

Can be LINE_BY_LINE or SPARK_DATASET, with options if latter is used (default LINE_BY_LINE)


r-spark/sparknlp documentation built on Oct. 15, 2022, 10:50 a.m.