registry_info: Get information from registry file

corpus_data_dirR Documentation

Get information from registry file

Description

Extract information from the internal C representation of registry data.

Usage

corpus_data_dir(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

corpus_info_file(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

corpus_full_name(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

corpus_p_attributes(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

corpus_s_attributes(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

corpus_properties(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

corpus_property(corpus, registry = Sys.getenv("CORPUS_REGISTRY"), property)

corpus_registry_dir(corpus)

Arguments

corpus

A length-one character vector with the corpus ID.

registry

A length-one character vector with the registry directory.

property

A corpus property defined in the registry file (.

Details

corpus_data_dir() will return the data directory (class fs_path) where the binary files of a corpus are kept (a directory also known as 'home' directory).

corpus_info_file() will return the path to the info file for a corpus (class fs_path object). If info file does not exist or INFO line is missing in the registry file, NA is returned.

corpus_full_name() will return the full name of the corpus defined in the registry file.

corpus_p_attributes() returns a character vector with the positional attributes of a corpus.

corpus_s_attributes() returns a character vector with the structural attributes of a corpus.

corpus_properties() returns a character vector with the corpus properties defined in the registry file. If the corpus cannot be located, NA is returned.

corpus_property() returns the value of a corpus property defined in the registry file, or NA if the corpus does not exist, is not loaded of if the property requested is undefined.

corpus_get_registry() will extract the registry directory with the registry file defining a corpus from the internal C representation of loaded corpora. The character vector that is returned may be > 1 if there are several corpora with the same id defined in registry files in different (registry) directories. If the corpus is not found, NA is returned.

Examples

corpus_data_dir("REUTERS", registry = get_tmp_registry())
corpus_info_file("REUTERS", registry = get_tmp_registry())
corpus_full_name("REUTERS", registry = get_tmp_registry())
corpus_p_attributes("REUTERS", registry = get_tmp_registry())
corpus_s_attributes("REUTERS", registry = get_tmp_registry())
corpus_properties("REUTERS", registry = get_tmp_registry())
corpus_property(
  "REUTERS",
  registry = get_tmp_registry(),
  property = "language"
)
corpus_registry_dir("REUTERS")
corpus_registry_dir("FOO") # NA returned

RcppCWB documentation built on July 9, 2023, 7:40 p.m.