tokens_context: Get the tokens of contexts sorrounding user defined patterns
In conText: 'a la Carte' on Text (ConText) Embedding Regression

tokens_context

R Documentation

Get the tokens of contexts sorrounding user defined patterns

Description

This function uses quanteda's kwic() function to find the contexts around user defined patterns (i.e. target words/phrases) and return a tokens object with the tokenized contexts and corresponding document variables.

Usage

tokens_context(
  x,
  pattern,
  window = 6L,
  valuetype = c("glob", "regex", "fixed"),
  case_insensitive = TRUE,
  hard_cut = FALSE,
  rm_keyword = TRUE,
  verbose = TRUE
)

Arguments

`x`	a (quanteda) `tokens-class` object
`pattern`	a character vector, list of character vectors, dictionary, or collocations object. See pattern for details.
`window`	the number of context words to be displayed around the keyword
`valuetype`	the type of pattern matching: `"glob"` for "glob"-style wildcard expressions; `"regex"` for regular expressions; or `"fixed"` for exact matching. See valuetype for details.
`case_insensitive`	logical; if `TRUE`, ignore case when matching a `pattern` or dictionary values
`hard_cut`	(logical) - if TRUE then a context must have `window` x 2 tokens, if FALSE it can have `window` x 2 or fewer (e.g. if a doc begins with a target word, then context will have `window` tokens rather than `window` x 2)
`rm_keyword`	(logical) if FALSE, keyword matching pattern is included in the tokenized contexts
`verbose`	(logical) if TRUE, report the total number of instances per pattern found

Value

a (quanteda) tokens-class. Each document in the output tokens object inherits the document variables (docvars) of the document from whence it came, along with a column registering corresponding the pattern used. This information can be retrieved using docvars().

Examples


library(quanteda)

# tokenize corpus
toks <- tokens(cr_sample_corpus)

# build a tokenized corpus of contexts sorrounding a target term
immig_toks <- tokens_context(x = toks, pattern = "immigr*", window = 6L)

conText documentation built on Feb. 16, 2023, 7:32 p.m.