GermaParl-package: GermaParl R Data Package.

Description Details Author(s) References See Also Examples

Description

GermaParl is a corpus of parliamentary debates in the German Bundestag. The package offers a convenient dissemination mechanism for the GermaParl corpus. The corpus has been linguistically annotated and indexed using the data format of the Corpus Workbench (CWB). To make full use if this data format, working with GermaParl in combination with the polmineR package is recommended.

Details

The GermaParl package initially only includes a subset of the GermaParl corpus which serves as a sample corpus ("GERMAPARLMINI"). To download the full corpus from the open science repository Zenodo, use the germaparl_download_corpus function.

The GermaParl R package and the GermaParl corpus are two different pieces of research data: The package offers a mechanism to ship, easily install and augment the data. The indexed corpus is the actual data. Package and corpus have different version numbers and should be quoted in combination in publications. We recommend to follow the instructions you see when calling citation(package = "GermaParl"). To ensure that the recommended citation fits the corpus you use, the citation for the corpus is available only when a version of GermaParl has been downloaded and installed.

Author(s)

Andreas Blaette andreas.blaette@uni-due.de

References

Blaette, Andreas (2018): "Using Data Packages to Ship Annotated Corpora of Parliamentary Protocols: The GermaParl R Package". ISBN 979-10-95546-02-3. Available online at http://lrec-conf.org/workshops/lrec2018/W2/pdf/15_W2.pdf.

See Also

Useful links:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# This example uses the GERMAPARLSAMPLE corpus rather than the full GERMAPARL
# corpus in order to reduce the time required for testing the code. To apply
# everything on GERMAPARL rather than GERMAPARLSAMPLE, set variable 'samplemode' 
# to FALSE, or simply omit argument 'sample'.

samplemode <- TRUE
corpus_id <- "GERMAPARLSAMPLE" # to get full corpus: corpus_id <- "GERMAPARL"

# This example assumes that the directories used by the CWB do not yet exist, so
# temporary directories are created.
cwb_dirs <- cwbtools::create_cwb_directories(prefix = tempdir(), ask = interactive())
registry_tmp <- cwb_dirs[["registry_dir"]]

# Download corpus from Zenodo
germaparl_download_corpus(
  registry_dir = registry_tmp,
  corpus_dir = cwb_dirs[["corpus_dir"]],
  verbose = FALSE,
  sample = samplemode
)

# Check availability of the corpus
germaparl_is_installed(sample = samplemode) # TRUE now
germaparl_get_version(sample = samplemode) # get version of indexed corpus
germaparl_get_doi(sample = samplemode) # get 'document object identifier' (DOI) of GERMAPARL corpus

GermaParl documentation built on Oct. 23, 2020, 8:27 p.m.