hierarchical_coverage_regex: Hierarchical Coverage of Regexes
In trinker/termco: Counts of Terms and Substrings

Description Usage Arguments Value See Also Examples

View source: R/hierarchical_coverage_regex.R

The unique coverage of a text vector by a regex after partitioning out the elements matched by previous regexes.

hierarchical_coverage_regex(
  text.var,
  term.list,
  ignore.case = TRUE,
  sort = FALSE,
  verbose = TRUE,
  ...
)

`text.var`	A text vector (vector of strings).
`term.list`	A list of named character vectors to match against `x`.
`ignore.case`	logical. Should case be ignored in matching the `terms` against `x`?
`sort`	logical. If `TRUE` the output is sorted by highest unique gain. If `FALSE` order of term input is retained.
`verbose`	If `TRUE` each iteration of the `for` loop prints `i of n`.
`...`	ignored.

Returns a data.frame with 7 columns:

step: the order in which the regex was searched for
name: the human readable name of the bound regex group
unique_prop: the unique prop coverage of the regex
unique_n: the unique n coverage of the regex
cum_prop: the cumulative prop coverage of the regex
cum_n: the cumulative n coverage of the regex
regex: the bound (|) regex that corresponds to name

Other hierarchical_coverage functions: hierarchical_coverage_term()

regs <- setNames(
    list(c('(?i)sam', "(?i)\\bam"), '^I', '(?i)(do|will) not', '(?i)(do|will)'),
    c('am', 'I', "won't")
)
(out <- hierarchical_coverage_regex(sam_i_am, regs, ignore.case=FALSE))
summary(out)
plot(out)
plot(out, mark.one = TRUE)

# Use unnamed vectors for `term.list` too
hierarchical_coverage_regex(sam_i_am, unlist(regs, use.names = FALSE), ignore.case=FALSE)