polmineR-package: polmineR-package

Description Details Author(s) References Examples

Description

Machinery for mining CWB corpora

Details

The package provides functions for basic text statistics for corpora that are managed by the Corpus Workbench (CWB). A core feature is to generate subcorpora/partitions based on metadata. The package is also meant to serve as an interface between the CWB and R-packages implementing more sophisticated statistical procedures (e.g. lsa, lda, topicmodels) or providing further functionality for text mining (e.g. tm).

Any analysis using this package will usually start with setting up a subcorpus/partition (with partition). A set of partitions can be generated with partitionBundle. Once a partition or a set of partitions has been set up, core functions are context and compare. Based on a partition bundle, a term-document matrix (class 'TermDocumentMatrix' from the tm package) can be generated (with as.TermDocumentMatrix). This opens the door to the wealth of statistical methods implemented in R.

When the package is loaded and attached, the package will look for a file name 'polmineR.conf' in a directory defined by the environment variable 'POLMINER_DIR'. It will take general settings for polmineR from that file. Second, templates are restored.

Author(s)

Andreas Blaette (andreas.blaette@uni-due.de)

References

http://polmine.sowi.uni-due.de

Examples

1
2
3
4
5
6
## Not run: 
# examples in the manual rely in a sample corpus that can be install as follows:
drat::addRepo("PolMine", alturl="https://polmine.github.io/drat/")
install.packages("polmineR.sampleCorpus")

## End(Not run)

nrauscher/corpus documentation built on May 23, 2019, 9:34 p.m.