View source: R/tokens_context.R
tokens_context | R Documentation |
This function uses quanteda's kwic()
function to find the contexts
around user defined patterns (i.e. target words/phrases) and return a tokens object
with the tokenized contexts and corresponding document variables.
tokens_context( x, pattern, window = 6L, valuetype = c("glob", "regex", "fixed"), case_insensitive = TRUE, hard_cut = FALSE, rm_keyword = TRUE, verbose = TRUE )
x |
a (quanteda) |
pattern |
a character vector, list of character vectors, dictionary, or collocations object. See pattern for details. |
window |
the number of context words to be displayed around the keyword |
valuetype |
the type of pattern matching: |
case_insensitive |
logical; if |
hard_cut |
(logical) - if TRUE then a context must have |
rm_keyword |
(logical) if FALSE, keyword matching pattern is included in the tokenized contexts |
verbose |
(logical) if TRUE, report the total number of instances per pattern found |
a (quanteda) tokens-class
. Each document in the output tokens object
inherits the document variables (docvars
) of the document from whence it came,
along with a column registering corresponding the pattern used.
This information can be retrieved using docvars()
.
library(quanteda) # tokenize corpus toks <- tokens(cr_sample_corpus) # build a tokenized corpus of contexts sorrounding a target term immig_toks <- tokens_context(x = toks, pattern = "immigr*", window = 6L)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.