| get_context | R Documentation |
A wrapper function for quanteda's kwic() function that subsets documents to where
target is present before tokenizing to speed up processing, and concatenates
kwic's pre/post variables into a context column.
get_context( x, target, window = 6L, valuetype = "fixed", case_insensitive = TRUE, hard_cut = FALSE, what = "word", verbose = TRUE )
x |
(character) vector - this is the set of documents (corpus) of interest. |
target |
(character) vector - these are the target words whose contexts we want to evaluate This vector may include a single token, a phrase or multiple tokens and/or phrases. |
window |
(numeric) - defines the size of a context (words around the target). |
valuetype |
the type of pattern matching: |
case_insensitive |
logical; if |
hard_cut |
(logical) - if TRUE then a context must have |
what |
(character) defines which quanteda tokenizer to use. You will rarely want to change this.
For chinese text you may want to set |
verbose |
(logical) - if TRUE, report the total number of target instances found. |
a data.frame with the following columns:
docname(character) document name to which instances belong to.
target(character) targets.
context(numeric) pre/post variables in kwic() output concatenated.
target in the return data.frame is equivalent to kwic()'s keyword output variable,
so it may not match the user-defined target exactly if valuetype is not fixed.
# get context words sorrounding the term immigration
context_immigration <- get_context(x = cr_sample_corpus, target = 'immigration',
window = 6, valuetype = "fixed", case_insensitive = FALSE,
hard_cut = FALSE, verbose = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.