hierarchical_coverage_regex: Hierarchical Coverage of Regexes

Description Usage Arguments Value See Also Examples

View source: R/hierarchical_coverage_regex.R

Description

The unique coverage of a text vector by a regex after partitioning out the elements matched by previous regexes.

Usage

1
2
3
4
5
6
7
8
hierarchical_coverage_regex(
  text.var,
  term.list,
  ignore.case = TRUE,
  sort = FALSE,
  verbose = TRUE,
  ...
)

Arguments

text.var

A text vector (vector of strings).

term.list

A list of named character vectors to match against x.

ignore.case

logical. Should case be ignored in matching the terms against x?

sort

logical. If TRUE the output is sorted by highest unique gain. If FALSE order of term input is retained.

verbose

If TRUE each iteration of the for loop prints i of n.

...

ignored.

Value

Returns a data.frame with 7 columns:

step

the order in which the regex was searched for

name

the human readable name of the bound regex group

unique_prop

the unique prop coverage of the regex

unique_n

the unique n coverage of the regex

cum_prop

the cumulative prop coverage of the regex

cum_n

the cumulative n coverage of the regex

regex

the bound (|) regex that corresponds to name

See Also

Other hierarchical_coverage functions: hierarchical_coverage_term()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
regs <- setNames(
    list(c('(?i)sam', "(?i)\\bam"), '^I', '(?i)(do|will) not', '(?i)(do|will)'),
    c('am', 'I', "won't")
)
(out <- hierarchical_coverage_regex(sam_i_am, regs, ignore.case=FALSE))
summary(out)
plot(out)
plot(out, mark.one = TRUE)

# Use unnamed vectors for `term.list` too
hierarchical_coverage_regex(sam_i_am, unlist(regs, use.names = FALSE), ignore.case=FALSE)

trinker/termco documentation built on Jan. 7, 2022, 3:32 a.m.