README.md

rusecqp: An R package for corpus linguistic analysis with Corpus Workbench corpora

3.4.2018, Thilo Wiertz

Description

The Open Source Project IMS Corpus Workbench (CWB, Evert & Hardie, 2011) provides a data model and corpus query processor (CQP) for linguistic analysis. The existing rcqp-Package (Desgraupes & Loiseau 2018) enables users to perform queries on Corpus Workbench corpora in R and access token level information, but does not provide high level functions for corpus linguistic analysis. rusecqp wraps and builds on top of rcqp to provide functions such as frequency distribution, ngrams, or keyword and collocation analysis.

Prerequisites

The package requires a working installation of the IMS Corpus Workbench on the system and rcqp. For installation instructions of both dependencies see the respective websites. Make sure to configure the path to the corpus workbench registry directory by Sys.setenv(CORPUS_REGISTRY = path_to_registry_dir) before loading rcqp. To check whether all dependencies are met and corpora are available, you can type in rcqp::cqi_list_corpora(). This should print a list of corpora available on the system.

Installation

To install rusecqp, install the devtools package and than rcqp. This will also install the dependency packages data.table and stringr as required: ```` install.packages("devtools") devtools::install_github("wiertz/rusecqp")

Load the package with

library("rusecqp") `````

Overview on functions

See the help of the respective functions for explanations of their usage. Currently implemented functions are:

Accessing and subsetting corpora

Corpus based analysis

Query processing

References

Contact

Thilo Wiertz, thilo.wiertz@geographie.uni-freiburg.de



wiertz/rusecqp documentation built on Feb. 9, 2022, 1:30 p.m.