object2id: Match quanteda objects against token types
In quanteda: Quantitative Analysis of Textual Data

object2id

R Documentation

Match quanteda objects against token types

Description

Developer function to match patterns in quanteda objects against token types.

Usage

object2id(
  x,
  types,
  valuetype = c("glob", "fixed", "regex"),
  case_insensitive = TRUE,
  concatenator = "_",
  levels = 1,
  match_pattern = c("any", "single", "multi"),
  keep_nomatch = FALSE
)

object2fixed(
  x,
  types,
  valuetype = c("glob", "fixed", "regex"),
  case_insensitive = TRUE,
  concatenator = "_",
  levels = 1,
  match_pattern = c("any", "single", "multi"),
  keep_nomatch = FALSE
)

Arguments

`x`	a list of character vectors, dictionary or collocations object
`types`	token types against which patterns are matched
`valuetype`	the type of pattern matching: `"glob"` for "glob"-style wildcard expressions; `"regex"` for regular expressions; or `"fixed"` for exact matching. See valuetype for details.
`case_insensitive`	logical; if `TRUE`, ignore case when matching a `pattern` or dictionary values
`concatenator`	the concatenation character that joins multi-word expression in `types`
`levels`	integers specifying the levels of entries in a hierarchical dictionary that will be applied. The top level is 1, and subsequent levels describe lower nesting levels. Values may be combined, even if these levels are not contiguous, e.g. `levels = c(1:3)` will collapse the second level into the first, but record the third level (if present) collapsed below the first (see examples).
`match_pattern`	select only single-word patterns or multi-word patterns should be matched. If "any", it matches both single-word and multi-word patterns.
`keep_nomatch`	keep patterns that did not match

Value

object2fixed() returns a list of character vectors of matched types. object2id() returns a list of indices of matched types with attributes. The "pattern" attribute records the indices of the matched patterns in x; the "key" attribute records the keys of the matched patterns when x is dictionary.

Examples

types <- c("A", "AA", "B", "BB", "B_B", "C", "C-C")

# dictionary
dict <- dictionary(list(A = c("a", "aa"), 
                        B = c("BB", "B B"),
                        C = c("C", "C-C")))
object2fixed(dict, types)
object2fixed(dict, types, match_pattern = "single")
object2fixed(dict, types, match_pattern = "multi")

# phrase
pats <- phrase(c("a", "aa", "zz", "bb", "b b"))
object2fixed(pats, types)
object2fixed(pats, types, keep_nomatch = TRUE)

quanteda documentation built on June 8, 2025, 9:41 p.m.