Nothing
#' Rcpp Bindings for the Corpus Workbench (CWB).
#'
#' @description
#' The \code{RcppCWB} package is a wrapper library to expose core functions of
#' the \code{Open Corpus Workbench} (CWB). This includes the low-level
#' functionality of the \code{Corpus Library} (CL) as well as capacities to use
#' the query syntax of the \code{Corpus Query Processor} (CQP).
#'
#' @section The Idea Behind RcppCWB:
#'
#' The \code{Open Corpus Workbench} (CWB) is an indexing and querying engine
#' popular in corpus-assisted research. Its core aim is to support working
#' efficiently with large, structurally and linguistically annotated corpora.
#' First of all, the CWB includes tools to index and compress corpora. Second,
#' the \code{Corpus Library} (CL) offers low-level functionality to retrieve
#' information from CWB indexed corpora. Third, the \code{Corpus Query
#' Processor} (CQP) offers a syntax that allows to perform anything from
#' simple to complex queries, using different annotation layers of corpora.
#'
#' The CWB is a classical tool which has inspired a set of developments. A
#' persisting advantage of the CWB is its mature, open source code base that
#' is actively maintained by a community of developers. It is used as a robust
#' and efficient backend for widely used tools such as
#' TXM(\url{https://txm.gitpages.huma-num.fr/textometrie/}) or CQPweb
#' (\url{https://cwb.sourceforge.io/cqpweb.php}). Its uncompromising C
#' implementation guarantees speed and makes it well suited to be integrated
#' with R at the same time.
#'
#' The package \code{RcppCWB} is a follow-up on the \code{rcqp} package that
#' has pioneered to expose CWB functionality from within R. Indeed, the
#' \code{rcqp} package, published at CRAN in 2015, offers robust access to CWB
#' functionality. However, the "pure C" implementation of the \code{rcqp}
#' package creates difficulties to make the package portable to Windows. The
#' primary purpose of the \code{RcppCWB} package is to reimplement a wrapper
#' library for the CWB using a design that makes it easier to achieve
#' cross-platform portability.
#'
#' Even though \code{RcppCWB} functions may be used directly, the package is
#' designed to serve as an interface to CWB indexed corpora in packages with
#' higher-level functionality. In this regard, \code{RcppCWB} is the backend
#' of the \code{polmineR} package. It is deliberately open to be used in other
#' contexts. The package may stimulate using linguistically annotated, indexed
#' and compressed corpora on all platforms. The paradigm of working with text
#' as linguistic data may benefit from \code{RcppCWB}.
#' @section Implementation:
#'
#' When building the package, the first step is to compile the relevant parts
#' of the CWB on Linux and macOS machines. On Windows, cross-compiled binaries
#' are downloaded from a GitHub repository of the PolMine Project
#' (\url{https://github.com/PolMine/libcl}). Second, \code{Rcpp} wrappers are
#' compiled and make the relevant functions of the Corpus Library and CQP
#' accessible. In addition to genuine CWB functions, \code{RcppCWB} offers a
#' set of higher level functions implemented using \code{Rcpp} for common
#' performance critical tasks.
#'
#'
#' @section Getting Started with RcppCWB:
#'
#' To understand the data storage model of the CWB, in particular the notions
#' of positional and structural attributes (s- and p-attributes), the vignette
#' of the \code{rcqp} package is a very good starting point (see references).
#'
#' The CWB 'Corpus Encoding Tutorial' explains how to create your own corpus,
#' the 'CQP Query Language Tutorial' introduces the syntax of CQP (see
#' references).
#'
#' The \code{RcppCWB} package includes a sample corpus (REUTERS, the data also
#' included in the \code{tm} package). The examples in the documentation
#' of the functions may be a good starting point to understand how to use
#' \code{RcppCWB}.
#'
#' @section Digging Deeper:
#'
#' The original paper of Christ (1994) explains the design choices of the CWB.
#' The indexing and compression techniques of the CWB (Huffman coding) are
#' explained in Witten et al. (1999).
#'
#' @section Acknowledgements:
#'
#' The work of the all developers of the CWB is gratefully acknowledged. There
#' is a particular intellectual debt to Bernard Desgraupes and Sylvain
#' Loiseau, and the \code{rcqp} package they developed as the original R
#' wrapper to expose the functionality of the CWB.
#'
#' @references
#' Christ, O. 1994. "A modular and flexible architecture for an integrated
#' corpus query system", in: Proceedings of COMPLEX '94, pp. 23-32. Budapest.
#' Available online at \url{https://cwb.sourceforge.io/files/Christ1994.pdf}
#'
#' Desgraupes, B.; Loiseau, S. 2012. Introduction to the rcqp package.
#' Vignette of the rcqp package. Available at the CRAN archive at
#' \url{https://cran.r-project.org/src/contrib/Archive/rcqp/}
#'
#' Evert, S. 2005. The CQP Query Language Tutorial. Available online at
#' \url{https://cwb.sourceforge.io/files/CWB_Encoding_Tutorial.pdf}
#'
#' Evert, S. 2005. The IMS Open Corpus Workbench (CWB). Corpus Encoding
#' Tutorial. Available online at
#' \url{https://cwb.sourceforge.io/files/CWB_Encoding_Tutorial.pdf}
#'
#' Open Corpus Workbench (\url{https://cwb.sourceforge.io})
#'
#' Witten, I.H.; Moffat, A.; Bell, T.C. (1999). Managing Gigabytes. Morgan
#' Kaufmann Publishing, San Francisco, 2nd edition.
#'
#'
#' @keywords package
#' @docType package
#' @rdname RcppCWB-packge
#' @aliases RcppCWB RcppCWB-package
#' @name RcppCWB-package
#' @useDynLib RcppCWB, .registration = TRUE
#' @importFrom Rcpp evalCpp
#' @exportPattern "^[[:alpha:]]+"
#' @author Andreas Blaette (andreas.blaette@@uni-due.de)
#' @examples
#' # functions of the corpus library (starting with cl) expose the low-level
#' # access to the CWB corpus library (CL)
#'
#' ids <- cl_cpos2id("REUTERS", cpos = 1:20, p_attribute = "word", registry = get_tmp_registry())
#' tokens <- cl_id2str("REUTERS", id = ids, p_attribute = "word", registry = get_tmp_registry())
#' print(paste(tokens, collapse = " "))
#'
#' # To use the corpus query processor (CQP) and its syntax, it is necessary first
#' # to initialize CQP (example: get concordances of 'oil')
#'
#' cqp_query("REUTERS", query = '[]{5} "oil" []{5}')
#' cpos_matrix <- cqp_dump_subcorpus("REUTERS")
#' concordances_oil <- apply(
#' cpos_matrix, 1,
#' function(row){
#' ids <- cl_cpos2id("REUTERS", p_attribute = "word", cpos = row[1]:row[2], get_tmp_registry())
#' tokens <- cl_id2str("REUTERS", p_attribute = "word", id = ids, get_tmp_registry())
#' paste(tokens, collapse = " ")
#' }
#' )
NULL
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.