nlp_conllu_read_dataset: Transform CoNLLU format text file to Spark dataframe
In r-spark/sparknlp: R Interface to John Snow Labs Spark NLP

nlp_conllu_read_dataset

R Documentation

Transform CoNLLU format text file to Spark dataframe

In order to train a Lemmatizer annotator, we need to get CoNLLU format data as a spark dataframe. There is a component that does this for us: it reads a plain text file and transforms it to a spark dataset. See https://nlp.johnsnowlabs.com/docs/en/annotators#conllu-dataset. All the function arguments have defaults. See https://nlp.johnsnowlabs.com/api/index.html#com.johnsnowlabs.nlp.training.CoNLLU for the defaults.

nlp_conllu_read_dataset(sc, path, read_as = NULL, explode_sentences = NULL)

`sc`	a Spark connection
`path`	path to the file to read
`read_as`	Can be LINE_BY_LINE or SPARK_DATASET, with options if latter is used (default LINE_BY_LINE)