| nlp_conll_read_dataset | R Documentation | 
In order to train a Named Entity Recognition DL annotator, we need to get CoNLL format data as a spark dataframe. There is a component that does this for us: it reads a plain text file and transforms it to a spark dataset. See https://nlp.johnsnowlabs.com/docs/en/annotators#conll-dataset. All the function arguments have defaults. See https://nlp.johnsnowlabs.com/api/index.html#com.johnsnowlabs.nlp.training.CoNLL for the defaults.
nlp_conll_read_dataset( sc, path, read_as = NULL, document_col = NULL, sentence_col = NULL, token_col = NULL, pos_col = NULL, conll_label_index = NULL, conll_pos_index = NULL, conll_text_col = NULL, label_col = NULL, explode_sentences = NULL, delimiter = NULL, parallelism = NULL, storage_level = NULL )
| sc | a Spark connection | 
| path | path to the file to read | 
| read_as | Can be LINE_BY_LINE or SPARK_DATASET, with options if latter is used (default LINE_BY_LINE) | 
| document_col | name to use for the document column | 
| sentence_col | name to use for the sentence column | 
| token_col | name to use for the token column | 
| pos_col | name to use for the part of speech column | 
| conll_label_index | index position in the file of the ner label | 
| conll_pos_index | index position in the file of the part of speech label | 
| conll_text_col | name to use for the text column | 
| label_col | name to use for the label column | 
| explode_sentences | boolean whether the sentences should be exploded or not | 
| delimiter | Delimiter used to separate columns inside CoNLL file | 
| parallelism | integer value | 
| storage_level | specifies the storage level to use for the dataset. Must be a string value from org.apache.spark.storage.StorageLevel (e.g. "DISK_ONLY"). See https://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html | 
Spark dataframe containing the imported data
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.