subcorpus-class: The S4 subcorpus class.
In polmineR: Verbs and Nouns for Corpus Analysis

subcorpus

R Documentation

The S4 subcorpus class.

Description

Class to manage subcorpora derived from a CWB corpus.

Usage

## S4 method for signature 'subcorpus'
summary(object)

## S4 replacement method for signature 'subcorpus'
name(x) <- value

## S4 method for signature 'subcorpus'
get_corpus(x)

## S4 method for signature 'subcorpus'
size(x, s_attribute = NULL, ...)

Arguments

`object`	A `subcorpus` object.
`x`	A `subcorpus` object.
`value`	A `character` vector to assign as name to slot `name` of a `subcorpus` class object.
`s_attribute`	A `character` vector with s-attributes (one or more).
`...`	Arguments passed into `size`-method. Used only to maintain backwards compatibility.

Methods (by generic)

summary(subcorpus): Get named list with basic information for subcorpus object.
name(subcorpus) <- value: Assign name to a subcorpus object.
get_corpus(subcorpus): Get the corpus ID from the subcorpus object.
size(subcorpus): Get the size of a subcorpus object from the respective slot of the object.

Slots

s_attributes: A named list with the structural attributes defining the subcorpus.
cpos: A matrix with left and right corpus positions defining regions (two column matrix with integer values).
annotations: Object of class list.
size: Total size (number of tokens) of the subcorpus object (a length-one integer vector). The value is accessible by calling the size-method on the subcorpus-object (see examples).
metadata: Object of class data.frame, metadata information.
strucs: Object of class integer, the strucs defining the subcorpus.
xml: Object of class character, whether the xml is "flat" or "nested".
s_attribute_strucs: Object of class character, the base node.
user: If the corpus on the server requires authentication, the username.
password: If the corpus on the server requires authentication, the password.

Examples

use("polmineR")

# basic example 
r <- corpus("REUTERS")
k <- subset(r, grepl("kuwait", places))
name(k) <- "kuwait"
y <- summary(k)
s <- size(k)

# the same with a magrittr pipe
corpus("REUTERS") %>%
  subset(grepl("kuwait", places)) %>%
  summary()
  
# subsetting a subcorpus in a pipe
stone <- corpus("GERMAPARLMINI") %>%
  subset(date == "2009-11-10") %>%
  subset(speaker == "Frank-Walter Steinmeier")

# perform count for subcorpus
n <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% count(p_attribute = "word")
n <- corpus("REUTERS") %>% subset(grep("saudi-arabia", places)) %>% count('"Saudi" "Arabia"')
  
# keyword-in-context analysis (kwic)   
k <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% kwic("oil")

polmineR documentation built on Nov. 2, 2023, 5:52 p.m.