kwic-class: S4 kwic class
In polmineR: Verbs and Nouns for Corpus Analysis

kwic-class

R Documentation

S4 kwic class

Description

S4 class for organizing information for kwic/concordance output. A set of standard generics (show, as.character, as.data.frame, length, sample, subset) as well as indexing is implemented to process kwic class objects (see 'Usage'). See section 'Details' for the enrich, view and knit_print methods.

Usage

## S4 method for signature 'kwic'
get_corpus(x)

## S4 method for signature 'kwic'
count(.Object, p_attribute = "word")

## S4 method for signature 'kwic'
as.DocumentTermMatrix(x, p_attribute, verbose = TRUE, ...)

## S4 method for signature 'kwic'
as.TermDocumentMatrix(x, p_attribute, verbose = TRUE, ...)

## S4 method for signature 'kwic'
show(object)

## S4 method for signature 'kwic'
knit_print(x, options = knitr::opts_chunk)

## S4 method for signature 'kwic'
as.character(x, fmt = "<i>%s</i>")

## S4 method for signature 'kwic,ANY,ANY,ANY'
x[i]

## S4 method for signature 'kwic'
subset(x, ...)

## S4 method for signature 'kwic'
as.data.frame(x)

## S4 method for signature 'kwic'
length(x)

## S4 method for signature 'kwic'
sample(x, size)

## S4 method for signature 'kwic_bundle'
merge(x)

## S4 method for signature 'kwic'
enrich(.Object, s_attributes = NULL, extra = NULL, table = FALSE, ...)

## S4 method for signature 'kwic'
format(
  x,
  node_color = "blue",
  align = TRUE,
  extra_color = "grey",
  lineview = getOption("polmineR.lineview")
)

## S4 method for signature 'kwic'
view(.Object)

Arguments

`x`	A `kwic` class object.
`.Object`	A `kwic` class object.
`p_attribute`	A length-one `character` vector supplying a p-attribute.
`verbose`	A `logical` value, whether to output debugging messages.
`...`	Used for backwards compatibility.
`object`	A `kwic` class object.
`options`	Chunk options.
`fmt`	A format string passed into `sprintf` to format the node of a KWIC display.
`i`	Single integer value, the kwic line for which the fulltext shall be inspected.
`size`	An `integer`, subset size for sampling.
`s_attributes`	Character vector of s-attributes with metainformation.
`extra`	An `integer` value, number of extra tokens to the left and to the right of the windows of tokens to the left and right of a query match that are decoded to be displayed in a kwic output to facilitate interpretation.
`table`	Logical, whether to turn cpos `data.table` into `data.table` in slot `stat` for output.
`node_color`	If not `NULL`, the html color of the node. If supplied, the node will be wrapped in respective html tags.
`align`	A `logical` value for preparing kwic output. If `TRUE`, whether the content of the columns 'left', 'node' and 'right' will be wrapped in html div elements that will align the output right, centered and left, respectively.
`extra_color`	If extra context has been generated using `enrich`, the html color of the additional output (defaults to 'grey').
`lineview`	A `logical` value, whether to concatenate left context, node and right context when preparing kwic output.

Details

Applying the count-method on a kwic object will return a count object with the evaluation of the left and right context of the match.

The knit_print() method will be called by knitr to render kwic objects as a DataTable htmlwidget when rendering a R Markdown document as html. It will usually be necessary to explicitly state "render = knit_print" in the chunk options. The option polmineR.pagelength controls the number of lines displayed in the resulting htmlwidget. Note that including htmlwidgets in html documents requires that pandoc is installed. To avoid an error, a formatted data.table is returned by knit_print() if pandoc is not available.

The as.character-method will return a list of character vectors, concatenating the columns "left", "node" and "right" of the data.table in the stat-slot of the input kwic-class object. Optionally, the node can be formatted using a format string that is passed into sprintf.

The subset-method will apply subset to the table in the slot stat, e.g. for filtering query results based on metadata (i.e. s-attributes) that need to be present.

The enrich method is used to generate the actual output for the kwic method. If param table is TRUE, corpus positions will be turned into a data.frame with the concordance lines. If param s_attributes is a character vector with s-attributes, the respective s-attributes will be added as columns to the table with concordance lines.

The format-method will return a data.table that can serve as input for rendering a htmlwidget, for instance using DT::datatable or rhandsontable::rhandsontable. It will include html tags, so ensure that the rendering engine does not obfuscate the html.

Slots

metadata: A character vector with s-attributes of the metadata that are to be displayed.
p_attribute: The p-attribute for which the context has been generated.
left: An integer value, words to the left of the query match.
right: An integer value, words to the right of the query match.
corpus: Length-one character vector, the CWB corpus.
cpos: A data.table with the columns "match_id", "cpos", "position", "word_id", "word" and "direction".
stat: A data.table, a table with columns "left", "node", "right", and metadata, if the object has been enriched.
encoding: A length-one character vector with the encoding of the corpus.
name: A length-one character vector naming the object.
annotation_cols: A character vector designating the columns of the data.table in the slot table that are annotations.

Examples

use("polmineR")
K <- kwic("GERMAPARLMINI", "Integration")
get_corpus(K)
length(K)
K_min <- K[1]
K_min <- K[1:5]

# using kwic_bundle class
queries <- c("oil", "prices", "barrel")
li <- lapply(queries, function(q) kwic("REUTERS", query = q))
kb <- as.bundle(li)

# use count-method on kwic object
coi <- kwic("REUTERS", query = "oil") %>%
  count(p_attribute = "word")

# features vs cooccurrences-method (identical results)
ref <- count("REUTERS", p_attribute = "word") %>%
  subset(word != "oil")
slot(ref, "size") <- slot(ref, "size") - count("REUTERS", "oil")[["count"]]
y_features <- features(coi, ref, method = "ll", included = TRUE)
y_cooc <- cooccurrences("REUTERS", query = "oil")

# extract node and left and right context as character vectors
oil <- kwic("REUTERS", query = "oil")
as.character(oil, fmt = NULL)
as.character(oil) # node wrapped into <i> tag by default
as.character(oil, fmt = "<b>%s</b>")

# subsetting kwic objects
oil <- corpus("REUTERS") %>%
  kwic(query = "oil") %>%
  subset(grepl("prices", right))
saudi_arabia <- corpus("REUTERS") %>%
  kwic(query = "Arabia") %>%
  subset(grepl("Saudi", left))
int_spd <- corpus("GERMAPARLMINI") %>%
  kwic(query = "Integration") %>%
  enrich(s_attribute = "party") %>%
  subset(grepl("SPD", party))

# turn kwic object into data.frame with html tags
int <- corpus("GERMAPARLMINI") %>%
  kwic(query = "Integration")

as.data.frame(int) # Without further metadata

enrich(int, s_attributes = c("date", "speaker", "party")) %>%
  as.data.frame()
  
# merge bundle of kwic objects into one kwic
reuters <- corpus("REUTERS")
queries <- c('"Saudi" "Arabia"', "oil", '"barrel.*"')
comb <- lapply(queries, function(qu) kwic(reuters, query = qu)) %>%
  as.bundle() %>%
  merge()
 
# enrich kwic object
i <- corpus("GERMAPARLMINI") %>%
  kwic(query = "Integration") %>%
  enrich(s_attributes = c("date", "speaker", "party"))

polmineR documentation built on Nov. 2, 2023, 5:52 p.m.