get_context | R Documentation |
A wrapper function for quanteda's kwic()
function that subsets documents to where
target is present before tokenizing to speed up processing, and concatenates
kwic's pre/post variables into a context
column.
get_context( x, target, window = 6L, valuetype = "fixed", case_insensitive = TRUE, hard_cut = FALSE, what = "word", verbose = TRUE )
x |
(character) vector - this is the set of documents (corpus) of interest. |
target |
(character) vector - these are the target words whose contexts we want to evaluate This vector may include a single token, a phrase or multiple tokens and/or phrases. |
window |
(numeric) - defines the size of a context (words around the target). |
valuetype |
the type of pattern matching: |
case_insensitive |
logical; if |
hard_cut |
(logical) - if TRUE then a context must have |
what |
(character) defines which quanteda tokenizer to use. You will rarely want to change this.
For chinese text you may want to set |
verbose |
(logical) - if TRUE, report the total number of target instances found. |
a data.frame
with the following columns:
docname
(character) document name to which instances belong to.
target
(character) targets.
context
(numeric) pre/post variables in kwic()
output concatenated.
target
in the return data.frame is equivalent to kwic()
's keyword
output variable,
so it may not match the user-defined target exactly if valuetype
is not fixed.
# get context words sorrounding the term immigration context_immigration <- get_context(x = cr_sample_corpus, target = 'immigration', window = 6, valuetype = "fixed", case_insensitive = FALSE, hard_cut = FALSE, verbose = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.