kwic | R Documentation |
For a text or a collection of texts (in a quanteda corpus object), return a list of a keyword supplied by the user in its immediate context, identifying the source text and the word index number within the source text. (Not the line number, since the text may or may not be segmented using end-of-line delimiters.)
kwic(
x,
pattern,
window = 5,
valuetype = c("glob", "regex", "fixed"),
separator = " ",
case_insensitive = TRUE,
index = NULL,
...
)
is.kwic(x)
## S3 method for class 'kwic'
as.data.frame(x, ...)
x |
a character, corpus, or tokens object |
pattern |
a character vector, list of character vectors, dictionary, or collocations object. See pattern for details. |
window |
the number of context words to be displayed around the keyword |
valuetype |
the type of pattern matching: |
separator |
a character to separate words in the output |
case_insensitive |
logical; if |
index |
an index object to specify keywords |
... |
unused |
A kwic
classed data.frame, with the document name
(docname
) and the token index positions (from
and to
,
which will be the same for single-word patterns, or a sequence equal in
length to the number of elements for multi-word phrases).
pattern
will be a keyword pattern or phrase, possibly multiple
patterns, that may include punctuation. If a pattern contains whitespace,
it is best to wrap it in phrase()
to make this explicit. However if
pattern
is a collocations
(see quanteda.textstats or
dictionary object, then the collocations or multi-word dictionary keys
will automatically be considered phrases where each whitespace-separated
element matches a token in sequence.
print-methods
# single token matching
toks <- tokens(data_corpus_inaugural[1:8])
kwic(toks, pattern = "secure*", valuetype = "glob", window = 3)
kwic(toks, pattern = "secur", valuetype = "regex", window = 3)
kwic(toks, pattern = "security", valuetype = "fixed", window = 3)
# phrase matching
kwic(toks, pattern = phrase("secur* against"), window = 2)
kwic(toks, pattern = phrase("war against"), valuetype = "regex", window = 2)
# use index
idx <- index(toks, phrase("secur* against"))
kwic(toks, index = idx, window = 2)
kw <- kwic(tokens(data_corpus_inaugural[1:20]), "provident*")
is.kwic(kw)
is.kwic("Not a kwic")
is.kwic(kw[, c("pre", "post")])
toks <- tokens(data_corpus_inaugural[1:8])
kw <- kwic(toks, pattern = "secure*", valuetype = "glob", window = 3)
as.data.frame(kw)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.