read.conll: Read CoNLL-type data
In martigso/tonR: Tools for operating the Talk of Norway dataset in R

Description Usage Arguments Value See Also Examples

A function for reading the CoNLL data used in the Talk of Norway project for legislative speeches in the Norwegian parliament (1998-2016). Works as a wrapper for read.csv

1	read.conll(tonFolder, id, keep = "all", rmStopwords = TRUE, rmLength = 1000)

`tonFolder`	Character vector specifying either absolute or relative path to the talk-of-norway repository folder
`id`	Character string specifying id of the file to read into R.
`keep`	Character string that specifies what parts-of-speech should be returned. Possible values are: "subst", "verb", "sbu", "prep", "det", "adj", "clb", "adv", "pron", "<komma>", "konj".
`rmWords`	Character vector of either "no" or of words to remove.

A data frame of the ToN speech specified by id, with sentence and token boundries, tokens, lemmatized tokens, parts-of-speech tagging, and morpheme features.

read.csv

# If the 'talk-of-norway' repository is placed in the folder below the tonR-package
speech <- read.conll("../talk-of-norway/data/annotations/tale100001.tsv")
barplot(table(speech$part_of_speech))

# Loading a subset of speeches using several ToN id tags
data("tonDemo")

texts <- lapply(tonDemo$id[1:10], function(x){
  read.conll(tonFolder = "../talk-of-norway/", id = x, keep = "adj")
})
lapply(texts, head)