registry_file: Parse and create registry files.

registry_file_parseR Documentation

Parse and create registry files.

Description

A set of functions to parse, create and write registry files.

Usage

registry_file_parse(corpus, registry_dir = Sys.getenv("CORPUS_REGISTRY"))

registry_file_compose(x)

registry_data(
  name,
  id,
  home,
  info = file.path(home, ".info", fsep = "/"),
  properties = c(charset = "utf-8"),
  p_attributes,
  s_attributes = character()
)

registry_file_write(
  data,
  corpus,
  registry_dir = Sys.getenv("CORPUS_REGISTRY"),
  ...
)

Arguments

corpus

A CWB corpus indicated by a length-one character vector.

registry_dir

Directory with registry files.

x

An object of class registry_data.

name

Long descriptive name of corpus (character vector).

id

Short name of corpus (character vector).

home

Path with data directory for indexed corpus.

info

A character vector containing path name of info file.

properties

Named character vector with corpus properties, should at least include 'charset'.

p_attributes

A character vector with positional attributes to declare.

s_attributes

A character vector with structural attributes to declare.

data

A registry_data object.

...

further parameters

Details

registry_file_parse() will return an object of class registry_data.

See the appendix to the 'Corpus Encoding Tutorial' (https://cwb.sourceforge.io/files/CWB_Encoding_Tutorial.pdf), which includes an explanation of the registry file format.

registry_file_compose will turn an registry_data-object into a character vector with a registry file that can be written to disk.

registry_file_write will compose a registry file from data and write it to disk.

Examples

regdata <- registry_file_parse(
  corpus = "REUTERS",
  registry_dir = system.file(package = "RcppCWB", "extdata", "cwb", "registry")
  )

cwbtools documentation built on May 15, 2022, 1:06 a.m.