parsing_sources: Parsing sources

Description Usage Arguments

Description

These function parse one (parse_source) or more (parse_sources) sources and the contained identifiers, sections, and codes.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
parse_source(text, file, codeRegexes = c(code =
  "\\[\\[([a-zA-Z0-9._>-]+)\\]\\]"), idRegexes = c(caseId =
  "\\[\\[cid=([a-zA-Z0-9._-]+)\\]\\]", stanzaId =
  "\\[\\[sid=([a-zA-Z0-9._-]+)\\]\\]"),
  sectionRegexes = c(paragraphs = "---paragraph-break---", secondary =
  "---<[a-zA-Z0-9]?>---"), autoGenerateIds = c("stanzaId"),
  persistentIds = c("caseId"), inductiveCodingHierarchyMarker = ">",
  metadataContainers = c("metadata"), codesContainers = c("codes",
  "dct"), delimiterRegEx = "^---$", ignoreRegex = "^#",
  ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = FALSE)

## S3 method for class 'rockParsedSource'
print(x, prefix = "### ", ...)

parse_sources(path, extension = "rock|dct", regex, recursive = TRUE,
  codeRegexes = c(code = "\\[\\[([a-zA-Z0-9._>-]+)\\]\\]"),
  idRegexes = c(caseId = "\\[\\[cid=([a-zA-Z0-9._-]+)\\]\\]",
  stanzaId = "\\[\\[sid=([a-zA-Z0-9._-]+)\\]\\]"),
  autoGenerateIds = c("stanzaId"), sectionRegexes = c(paragraphs =
  "---paragraph-break---", secondary = "---<[a-zA-Z0-9]?>---"),
  inductiveCodingHierarchyMarker = ">", delimiterRegEx = "^---$",
  metadataContainers = c("metadata"), codesContainers = c("codes",
  "dct"), ignoreRegex = "^#", ignoreOddDelimiters = FALSE,
  encoding = "UTF-8", silent = TRUE)

## S3 method for class 'rockParsedSources'
print(x, prefix = "### ", ...)

## S3 method for class 'rockParsedSources'
plot(x, ...)

Arguments

text, file

As text or file, you can specify a file to read with encoding encoding, which will then be read using base::readLines(). If the argument is named text, whether it is the path to an existing file is checked first, and if it is, that file is read. If the argument is named file, and it does not point to an existing file, an error is produced (useful if calling from other functions). A text should be a character vector where every element is a line of the original source (like provided by base::readLines()); although if a character vector of one element and including at least one newline character (\n) is provided as text, it is split at the newline characters using base::strsplit(). Basically, this behavior means that the first argument can be either a character vector or the path to a file; and if you're specifying a file and you want to be certain that an error is thrown if it doesn't exist, make sure to name it file.

codeRegexes, idRegexes, sectionRegexes

These are named character vectors with one or more regular expressions. For codeRegexes, these specify how to extract the codes (that were used to code the sources). For idRegexes, these specify how to extract the different types of identifiers. For sectionRegexes, these specify how to extract the different types of sections. The codeRegexes and idRegexes must each contain one capturing group to capture the codes and identifiers, respectively.

autoGenerateIds

The names of the idRegexes that, if missing, should receive autogenerated identifiers (which consist of 'autogenerated_' followed by an incrementing number).

persistentIds

The names of the idRegexes for the identifiers which, once attached to an utterance, should be attached to all following utterances as well (until a new identifier with the same name is encountered, after which that identifier will be attached to all following utterances, etc).

inductiveCodingHierarchyMarker

For inductive coding, this marker is used to indicate hierarchical relationships between codes. The code at the left hand side of this marker will be considered the parent code of the code on the right hand side. More than two levels can be specified in one code (for example, if the inductiveCodingHierarchyMarker is '>', the code grandparent>child>grandchild would indicate codes at three levels.

metadataContainers

The name of YAML fragments containing metadata (i.e. attributes about cases).

codesContainers

The name of YAML fragments containing (parts of) deductive coding trees.

delimiterRegEx

The regular expression that is used to extract the YAML fragments.

ignoreRegex

The regular expression that is used to delete lines before any other processing. This can be used to enable adding comments to sources, which are then ignored during analysis.

ignoreOddDelimiters

If an odd number of YAML delimiters is encountered, whether this should result in an error (FALSE) or just be silently ignored (TRUE).

encoding

The encoding of the file to read (in file).

silent

Whether to provide (FALSE) or suppress (TRUE) more detailed progress updates.

x

The object to print.

prefix

The prefix to use before the 'headings' of the printed result.

...

Any additional arguments are passed on to the default print method.

path

The path containing the files to read.

extension

The extension of the files to read; files with other extensions will be ignored. Multiple extensions can be separated by a pipe (|).

regex

Instead of specifing an extension, it's also possible to specify a regular expression; only files matching this regular expression are read. If specified, regex takes precedece over extension,

recursive

Whether to also process subdirectories (TRUE) or not (FALSE).


Matherion/rock documentation built on May 19, 2019, 6:20 p.m.