registry_get_name | R Documentation |
Functions to extract information from a registry file describing a corpus. Several operations could be accomplished with the 'cwb-regedit' tool, the functions defined here ensure that manipulating the registry is possible without a full installation of the CWB.
registry_get_name(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
registry_get_id(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
registry_get_home(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
registry_get_info(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
registry_get_encoding(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
registry_get_p_attributes(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
registry_get_s_attributes(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
registry_get_properties(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
corpus |
name of the CWB corpus |
registry |
directory of the registry (defaults to CORPUS_Registry environment variable) |
An appendix to the 'Corpus Encoding Tutorial' (https://cwb.sourceforge.io/files/CWB_Encoding_Tutorial.pdf) includes an explanation of the registry file format.
registry_get_encoding
will parse the registry file for a
corpus and return the encoding that is defined (corpus property "charset").
If parsing the registry does not yield a result (corpus property "charset"
not defined), the CWB standard encoding ("latin1") is assigned to prevent
errors. Note that RcppCWB::cl_charset_name
is equivalent but is
faster as it uses the internal C representation of a corpus rather than
parsing the registry file.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.