Description Details Author(s) References See Also Examples
GermaParl is a corpus of parliamentary debates in the German Bundestag. The package offers a convenient dissemination mechanism for the GermaParl corpus. The corpus has been linguistically annotated and indexed using the data format of the Corpus Workbench (CWB). To make full use if this data format, working with GermaParl in combination with the polmineR package is recommended.
The GermaParl package initially only includes a subset of the GermaParl
corpus which serves as a sample corpus ("GERMAPARLMINI"). To download the
full corpus from the open science repository Zenodo, use the
germaparl_download_corpus
function.
The GermaParl R package and the GermaParl corpus are two
different pieces of research data: The package offers a mechanism to ship,
easily install and augment the data. The indexed corpus is the actual data.
Package and corpus have different version numbers and should be quoted in
combination in publications. We recommend to follow the instructions you see
when calling citation(package = "GermaParl")
. To ensure that the
recommended citation fits the corpus you use, the citation for the corpus is
available only when a version of GermaParl has been downloaded and
installed.
Andreas Blaette andreas.blaette@uni-due.de
Blaette, Andreas (2018): "Using Data Packages to Ship Annotated Corpora of Parliamentary Protocols: The GermaParl R Package". ISBN 979-10-95546-02-3. Available online at http://lrec-conf.org/workshops/lrec2018/W2/pdf/15_W2.pdf.
Useful links:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # This example uses the GERMAPARLSAMPLE corpus rather than the full GERMAPARL
# corpus in order to reduce the time required for testing the code. To apply
# everything on GERMAPARL rather than GERMAPARLSAMPLE, set variable 'samplemode'
# to FALSE, or simply omit argument 'sample'.
samplemode <- TRUE
corpus_id <- "GERMAPARLSAMPLE" # to get full corpus: corpus_id <- "GERMAPARL"
# This example assumes that the directories used by the CWB do not yet exist, so
# temporary directories are created.
cwb_dirs <- cwbtools::create_cwb_directories(prefix = tempdir(), ask = interactive())
registry_tmp <- cwb_dirs[["registry_dir"]]
# Download corpus from Zenodo
germaparl_download_corpus(
registry_dir = registry_tmp,
corpus_dir = cwb_dirs[["corpus_dir"]],
verbose = FALSE,
sample = samplemode
)
# Check availability of the corpus
germaparl_is_installed(sample = samplemode) # TRUE now
germaparl_get_version(sample = samplemode) # get version of indexed corpus
germaparl_get_doi(sample = samplemode) # get 'document object identifier' (DOI) of GERMAPARL corpus
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.