Description Usage Arguments Value References Examples
Create a kwic from vector, list, data.frame or other structure containing linguistic corpora
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | kwic(corpus, pattern, left = ifelse(unit == "char", 20, 5),
right = ifelse(unit == "char", 20, 5), unit = "char", fixed = TRUE,
ref = NULL, ...)
## S4 method for signature 'character'
kwic(corpus, pattern, left = 20, right = 20,
unit = "char", fixed = TRUE, ref = names(corpus))
## S4 method for signature 'list'
kwic(corpus, pattern, left = 20, right = 20,
unit = "char", fixed = TRUE, ref = names(corpus))
## S4 method for signature 'VCorpus'
kwic(corpus, pattern, left = 20, right = 20,
unit = "char", fixed = TRUE, ref = names(corpus))
## S4 method for signature 'data.frame'
kwic(corpus, pattern, left = 5, right = 5,
unit = "char", ref = NULL, token.column = "token",
id.column = "doc_id", interlinearize.with = NULL)
|
corpus |
the corpus (various data structure) |
pattern |
length-1 character vector or either regexpr or fixed string to be search for |
left |
length-1 integer vector : number of chars/tokens (see unit) on the right size |
right |
length-1 integer vector : number of chars/tokens (see unit) on the left size |
unit |
length-1 character vector : one of "char" or "token" : defines the left and right contexts as number of character or as number of words |
fixed |
length-1 logical vector : is the pattern argument to be interpreted as a regexpr or as a fixed string |
ref |
character vectors: the name for the different parts of the corpus |
... |
unused arguments |
token.column |
length-1 character vector : the name of the column containing the occurrences. 'token' is the default, according to Text Interchange Formats (see reference). |
id.column |
length-1 character vector : the name column of the column for creating textual unit you don't wan't the kwic to cross. 'doc_id' is the default, as it is supposed to exist in all data.frame according to Text Interchange Formats (see reference). |
interlinearize.with |
character vector : the name of other column with which one can search. |
a KwicLine or KwicToken object (depending on the value of unit)
Text Interchange Formats : https://github.com/ropensci/tif
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # Concordance with a vector of untokenized strings
data(dickensv)
kwic(dickensv, "the")
# Concordance with a list of tokens
data(dickensl)
kwic(dickensl, "the")
# Concordance with a tm object
library(tm)
data(acq)
kwic(acq, "stock")
# Concordance with a data frame. Defaults are used for the arguments
# 'token.column' 'id.column' (ie column names 'token' and 'doc_id')
data(dickensdf)
kwic(dickensdf, "the")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.