cl_charset_name: Get charset of a corpus.

View source: R/cl.R

cl_charset_nameR Documentation

Get charset of a corpus.

Description

The encoding of a corpus is declared in the registry file (corpus property "charset"). Once a corpus is loaded, this information is available without parsing the registry file again and again. The cl_charset_name offers a quick access to this information.

Usage

cl_charset_name(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

Arguments

corpus

Name of a CWB corpus (upper case).

registry

Path to the registry directory, defaults to the value of the environment variable CORPUS_REGISTRY

Examples

cl_charset_name(
  corpus = "REUTERS",
  registry = system.file(package = "RcppCWB", "extdata", "cwb", "registry")
)

RcppCWB documentation built on July 9, 2023, 7:40 p.m.