read.conll: Read CoNLL-type data

Description Usage Arguments Value See Also Examples

Description

A function for reading the CoNLL data used in the Talk of Norway project for legislative speeches in the Norwegian parliament (1998-2016). Works as a wrapper for read.csv

Usage

1
read.conll(tonFolder, id, keep = "all", rmStopwords = TRUE, rmLength = 1000)

Arguments

tonFolder

Character vector specifying either absolute or relative path to the talk-of-norway repository folder

id

Character string specifying id of the file to read into R.

keep

Character string that specifies what parts-of-speech should be returned. Possible values are: "subst", "verb", "sbu", "prep", "det", "adj", "clb", "adv", "pron", "<komma>", "konj".

rmWords

Character vector of either "no" or of words to remove.

Value

A data frame of the ToN speech specified by id, with sentence and token boundries, tokens, lemmatized tokens, parts-of-speech tagging, and morpheme features.

See Also

read.csv

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# If the 'talk-of-norway' repository is placed in the folder below the tonR-package
speech <- read.conll("../talk-of-norway/data/annotations/tale100001.tsv")
barplot(table(speech$part_of_speech))

# Loading a subset of speeches using several ToN id tags
data("tonDemo")

texts <- lapply(tonDemo$id[1:10], function(x){
  read.conll(tonFolder = "../talk-of-norway/", id = x, keep = "adj")
})
lapply(texts, head)

martigso/tonR documentation built on May 21, 2019, 12:38 p.m.