Description Usage Arguments Value See Also Examples
A function for reading the CoNLL data used in the Talk of Norway project for legislative speeches
in the Norwegian parliament (1998-2016). Works as a wrapper for read.csv
1 | read.conll(tonFolder, id, keep = "all", rmStopwords = TRUE, rmLength = 1000)
|
tonFolder |
Character vector specifying either absolute or relative path to the talk-of-norway repository folder |
id |
Character string specifying id of the file to read into R. |
keep |
Character string that specifies what parts-of-speech should be returned. Possible values are: "subst", "verb", "sbu", "prep", "det", "adj", "clb", "adv", "pron", "<komma>", "konj". |
rmWords |
Character vector of either "no" or of words to remove. |
A data frame of the ToN speech specified by id
, with sentence and token boundries, tokens, lemmatized
tokens, parts-of-speech tagging, and morpheme features.
1 2 3 4 5 6 7 8 9 10 11 | # If the 'talk-of-norway' repository is placed in the folder below the tonR-package
speech <- read.conll("../talk-of-norway/data/annotations/tale100001.tsv")
barplot(table(speech$part_of_speech))
# Loading a subset of speeches using several ToN id tags
data("tonDemo")
texts <- lapply(tonDemo$id[1:10], function(x){
read.conll(tonFolder = "../talk-of-norway/", id = x, keep = "adj")
})
lapply(texts, head)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.