kwic: Create a kwic from vector, list, data.frame or other...

Description Usage Arguments Value References Examples

Description

Create a kwic from vector, list, data.frame or other structure containing linguistic corpora

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
kwic(corpus, pattern, left = ifelse(unit == "char", 20, 5),
  right = ifelse(unit == "char", 20, 5), unit = "char", fixed = TRUE,
  ref = NULL, ...)

## S4 method for signature 'character'
kwic(corpus, pattern, left = 20, right = 20,
  unit = "char", fixed = TRUE, ref = names(corpus))

## S4 method for signature 'list'
kwic(corpus, pattern, left = 20, right = 20,
  unit = "char", fixed = TRUE, ref = names(corpus))

## S4 method for signature 'VCorpus'
kwic(corpus, pattern, left = 20, right = 20,
  unit = "char", fixed = TRUE, ref = names(corpus))

## S4 method for signature 'data.frame'
kwic(corpus, pattern, left = 5, right = 5,
  unit = "char", ref = NULL, token.column = "token",
  id.column = "doc_id", interlinearize.with = NULL)

Arguments

corpus

the corpus (various data structure)

pattern

length-1 character vector or either regexpr or fixed string to be search for

left

length-1 integer vector : number of chars/tokens (see unit) on the right size

right

length-1 integer vector : number of chars/tokens (see unit) on the left size

unit

length-1 character vector : one of "char" or "token" : defines the left and right contexts as number of character or as number of words

fixed

length-1 logical vector : is the pattern argument to be interpreted as a regexpr or as a fixed string

ref

character vectors: the name for the different parts of the corpus

...

unused arguments

token.column

length-1 character vector : the name of the column containing the occurrences. 'token' is the default, according to Text Interchange Formats (see reference).

id.column

length-1 character vector : the name column of the column for creating textual unit you don't wan't the kwic to cross. 'doc_id' is the default, as it is supposed to exist in all data.frame according to Text Interchange Formats (see reference).

interlinearize.with

character vector : the name of other column with which one can search.

Value

a KwicLine or KwicToken object (depending on the value of unit)

References

Text Interchange Formats : https://github.com/ropensci/tif

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Concordance with a vector of untokenized strings
data(dickensv)
kwic(dickensv, "the")
# Concordance with a list of tokens
data(dickensl)
kwic(dickensl, "the")

# Concordance with a tm object
library(tm)
data(acq)
kwic(acq, "stock")
# Concordance with a data frame. Defaults are used for the arguments
# 'token.column' 'id.column' (ie column names 'token' and 'doc_id')
data(dickensdf)
kwic(dickensdf, "the")

sylvainloiseau/kwic documentation built on May 26, 2019, 5:31 a.m.