The GermaParl Corpus has been prepared in the PolMine Project and comprises all protocols of plenary sessions in the German Bundestag (1996 - 2013). The corpus includes the speeches that were actually given in the German Bundestag. Speeches that were only included in the printed protocol but that were not delivered are not yet part of corpus preparation.
This version of the corpus is based on plain text documents issued by the German Bundestag. For a period between 2008 and 2010, txt files are not available. To fill the gap, pdf documents were processed.
As part of the corpus preparation pipeline, the data has been linguistically annotated (using the treetagger) and imported into the Corpus Workbench (CWB).
See the GermaParl documentation website for further information.
The data comes with a CLARIN PUB+BY+NC+SA license, as an explanation of the license, see CLARIN licenses.
If you work with GermaParl package, please include the following reference in your bibliography to attribute the language resource:
Blaette, Andreas (2017): GermaParl. Corpus of Plenary Protocols of the German Bundestag. R Data Package (v1.0.4). http://polmine.sowi.uni-due.de/packages/src/contrib/GermaParl_1.0.4.tar.gz.
We hope that GermaParl will be a fruitful in your research. We would be glad to learn what you do with the data, and make your blog entries or publications visible here. Please bring issues that you come across to our attention!
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.