subcorpus | R Documentation |
Class to manage subcorpora derived from a CWB corpus.
## S4 method for signature 'subcorpus'
summary(object)
## S4 replacement method for signature 'subcorpus'
name(x) <- value
## S4 method for signature 'subcorpus'
get_corpus(x)
## S4 method for signature 'subcorpus'
size(x, s_attribute = NULL, ...)
object |
A |
x |
A |
value |
A |
s_attribute |
A |
... |
Arguments passed into |
summary(subcorpus)
: Get named list with basic information for
subcorpus
object.
name(subcorpus) <- value
: Assign name to a subcorpus
object.
get_corpus(subcorpus)
: Get the corpus ID from the subcorpus
object.
size(subcorpus)
: Get the size of a subcorpus
object from the
respective slot of the object.
s_attributes
A named list
with the structural attributes
defining the subcorpus.
cpos
A matrix
with left and right corpus positions defining
regions (two column matrix with integer
values).
annotations
Object of class list
.
size
Total size (number of tokens) of the subcorpus
object (a
length-one integer
vector). The value is accessible by calling
the size
-method on the subcorpus
-object (see examples).
metadata
Object of class data.frame
, metadata information.
strucs
Object of class integer
, the strucs defining the
subcorpus.
xml
Object of class character
, whether the xml is "flat" or
"nested".
s_attribute_strucs
Object of class character
, the base node.
user
If the corpus on the server requires authentication, the username.
password
If the corpus on the server requires authentication, the password.
Most commonly, a subcorpus
is derived from a corpus
or
a subcorpus
using the subset
method. See
size
for detailed documentation on how to use the
size
-method. The subcorpus
class shares many features with
the partition
class, but it is more parsimonious and does not
include information on statistical properties of the subcorpus (i.e. a
count table). In line with this logic, the subcorpus
class inherits
from the corpus
class, whereas the partition
class inherits
from the textstat
class.
Other classes to manage corpora:
corpus-class
,
phrases-class
,
ranges-class
,
regions
use("polmineR")
# basic example
r <- corpus("REUTERS")
k <- subset(r, grepl("kuwait", places))
name(k) <- "kuwait"
y <- summary(k)
s <- size(k)
# the same with a magrittr pipe
corpus("REUTERS") %>%
subset(grepl("kuwait", places)) %>%
summary()
# subsetting a subcorpus in a pipe
stone <- corpus("GERMAPARLMINI") %>%
subset(date == "2009-11-10") %>%
subset(speaker == "Frank-Walter Steinmeier")
# perform count for subcorpus
n <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% count(p_attribute = "word")
n <- corpus("REUTERS") %>% subset(grep("saudi-arabia", places)) %>% count('"Saudi" "Arabia"')
# keyword-in-context analysis (kwic)
k <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% kwic("oil")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.