parsing_sources: Parsing sources
In Matherion/rock: Reproducible Open Coding Kit

These function parse one (parse_source) or more (parse_sources) sources and the contained identifiers, sections, and codes.

parse_source(text, file, codeRegexes = c(code =
  "\\[\\[([a-zA-Z0-9._>-]+)\\]\\]"), idRegexes = c(caseId =
  "\\[\\[cid=([a-zA-Z0-9._-]+)\\]\\]", stanzaId =
  "\\[\\[sid=([a-zA-Z0-9._-]+)\\]\\]"),
  sectionRegexes = c(paragraphs = "---paragraph-break---", secondary =
  "---<[a-zA-Z0-9]?>---"), autoGenerateIds = c("stanzaId"),
  persistentIds = c("caseId"), inductiveCodingHierarchyMarker = ">",
  metadataContainers = c("metadata"), codesContainers = c("codes",
  "dct"), delimiterRegEx = "^---$", ignoreRegex = "^#",
  ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = FALSE)

## S3 method for class 'rockParsedSource'
print(x, prefix = "### ", ...)

parse_sources(path, extension = "rock|dct", regex, recursive = TRUE,
  codeRegexes = c(code = "\\[\\[([a-zA-Z0-9._>-]+)\\]\\]"),
  idRegexes = c(caseId = "\\[\\[cid=([a-zA-Z0-9._-]+)\\]\\]",
  stanzaId = "\\[\\[sid=([a-zA-Z0-9._-]+)\\]\\]"),
  autoGenerateIds = c("stanzaId"), sectionRegexes = c(paragraphs =
  "---paragraph-break---", secondary = "---<[a-zA-Z0-9]?>---"),
  inductiveCodingHierarchyMarker = ">", delimiterRegEx = "^---$",
  metadataContainers = c("metadata"), codesContainers = c("codes",
  "dct"), ignoreRegex = "^#", ignoreOddDelimiters = FALSE,
  encoding = "UTF-8", silent = TRUE)

## S3 method for class 'rockParsedSources'
print(x, prefix = "### ", ...)

## S3 method for class 'rockParsedSources'
plot(x, ...)

`text, file`	As `text` or `file`, you can specify a `file` to read with encoding `encoding`, which will then be read using `base::readLines()`. If the argument is named `text`, whether it is the path to an existing file is checked first, and if it is, that file is read. If the argument is named `file`, and it does not point to an existing file, an error is produced (useful if calling from other functions). A `text` should be a character vector where every element is a line of the original source (like provided by `base::readLines()`); although if a character vector of one element and including at least one newline character (`\n`) is provided as `text`, it is split at the newline characters using `base::strsplit()`. Basically, this behavior means that the first argument can be either a character vector or the path to a file; and if you're specifying a file and you want to be certain that an error is thrown if it doesn't exist, make sure to name it `file`.
`codeRegexes, idRegexes, sectionRegexes`	These are named character vectors with one or more regular expressions. For `codeRegexes`, these specify how to extract the codes (that were used to code the sources). For `idRegexes`, these specify how to extract the different types of identifiers. For `sectionRegexes`, these specify how to extract the different types of sections. The `codeRegexes` and `idRegexes` must each contain one capturing group to capture the codes and identifiers, respectively.
`autoGenerateIds`	The names of the `idRegexes` that, if missing, should receive autogenerated identifiers (which consist of 'autogenerated_' followed by an incrementing number).
`persistentIds`	The names of the `idRegexes` for the identifiers which, once attached to an utterance, should be attached to all following utterances as well (until a new identifier with the same name is encountered, after which that identifier will be attached to all following utterances, etc).
`inductiveCodingHierarchyMarker`	For inductive coding, this marker is used to indicate hierarchical relationships between codes. The code at the left hand side of this marker will be considered the parent code of the code on the right hand side. More than two levels can be specified in one code (for example, if the `inductiveCodingHierarchyMarker` is '>', the code `grandparent>child>grandchild` would indicate codes at three levels.
`metadataContainers`	The name of YAML fragments containing metadata (i.e. attributes about cases).
`codesContainers`	The name of YAML fragments containing (parts of) deductive coding trees.
`delimiterRegEx`	The regular expression that is used to extract the YAML fragments.
`ignoreRegex`	The regular expression that is used to delete lines before any other processing. This can be used to enable adding comments to sources, which are then ignored during analysis.
`ignoreOddDelimiters`	If an odd number of YAML delimiters is encountered, whether this should result in an error (`FALSE`) or just be silently ignored (`TRUE`).
`encoding`	The encoding of the file to read (in `file`).
`silent`	Whether to provide (`FALSE`) or suppress (`TRUE`) more detailed progress updates.
`x`	The object to print.
`prefix`	The prefix to use before the 'headings' of the printed result.
`...`	Any additional arguments are passed on to the default print method.
`path`	The path containing the files to read.
`extension`	The extension of the files to read; files with other extensions will be ignored. Multiple extensions can be separated by a pipe (`\|`).
`regex`	Instead of specifing an extension, it's also possible to specify a regular expression; only files matching this regular expression are read. If specified, `regex` takes precedece over `extension`,
`recursive`	Whether to also process subdirectories (`TRUE`) or not (`FALSE`).