| partition | R Documentation |
Create a subcorpus and keep it in an object of the partition class. If
defined, counts are performed for the p-attribute defined by the parameter
p_attribute.
partition(.Object, ...)
## S4 method for signature 'corpus'
partition(
.Object,
def = NULL,
name = "",
encoding = NULL,
p_attribute = NULL,
regex = FALSE,
xml = slot(.Object, "xml"),
decode = TRUE,
type = get_type(.Object),
mc = FALSE,
verbose = TRUE,
...
)
## S4 method for signature 'character'
partition(
.Object,
def = NULL,
name = "",
encoding = NULL,
p_attribute = NULL,
regex = FALSE,
decode = TRUE,
type = get_type(.Object),
mc = FALSE,
verbose = TRUE,
...
)
## S4 method for signature 'environment'
partition(.Object, slots = c("name", "corpus", "size", "p_attribute"))
## S4 method for signature 'partition'
partition(
.Object,
def = NULL,
name = "",
regex = FALSE,
p_attribute = NULL,
decode = TRUE,
xml = NULL,
verbose = TRUE,
mc = FALSE,
...
)
## S4 method for signature 'context'
partition(.Object, node = TRUE)
## S4 method for signature 'remote_corpus'
partition(.Object, ...)
## S4 method for signature 'remote_partition'
partition(.Object, ...)
.Object |
A length-one character-vector, the CWB corpus to be used. |
... |
Arguments to define partition (see examples). If |
def |
A named list of character vectors of s-attribute values, the names are the s-attributes (see details and examples) |
name |
A name for the new |
encoding |
The encoding of the corpus (typically "LATIN1 or "(UTF-8)), if NULL, the encoding provided in the registry file of the corpus (charset="...") will be used. |
p_attribute |
The p-attribute(s) for which a count is performed. |
regex |
A logical value (defaults to FALSE). |
xml |
Either 'flat' (default) or 'nested'. |
decode |
Logical, whether to turn token ids to strings (set FALSE to minimize object size / memory consumption) in data.table with counts. |
type |
A length-one character vector specifying the type of corpus / partition (e.g. "plpr") |
mc |
Whether to use multicore (for counting terms). |
verbose |
Logical, whether to be verbose. |
slots |
Object slots that will be reported columns of |
node |
A logical value, whether to include the node (i.e. query matches) in the region matrix
generated when creating a |
The function sets up a partition object based on s-attribute values.
The s-attributes defining the partition can be passed in as a list, e.g.
list(interjection="speech", year = "2013"), or directly (see
examples).
The s-attribute values defining the partition may use regular expressions. To
use regular expressions, set the parameter regex to TRUE. Regular
expressions are passed into grep, i.e. the regex syntax used in R
needs to be used (double backlashes etc.). If regex is FALSE, the
length of the character vectors can be > 1, matching s-attributes are
identifies with the operator '%in%'.
The XML imported into the CWB may be "flat" or "nested". This needs to be
indicated with the parameter xml (default is "flat"). If you generate
a partition based on a flat XML structure, some performance gain may be
achieved when ordering the s-attributes with decreasingly restrictive
conditions. If you have a nested XML, it is mandatory that the order of the
s-attributes provided reflects the hierarchy of the XML: The top-level
elements need to be positioned at the beginning of the list with the
s-attributes, the the most restrictive elements at the end.
If p_attribute is not NULL, a count of tokens in the corpus will be
performed and kept in the stat-slot of the partition-object. The
length of the p_attribute character vector may be 1 or more. If two or
more p-attributes are provided, The occurrence of combinations will be
counted. A typical scenario is to combine the p-attributes "word" or "lemma"
and "pos".
If .Object is a length-one character vector, a subcorpus/partition
for the corpus defined be .Object is generated.
If .Object is an environment (typically .GlobalEnv),
the partition objects present in the environment are listed.
If .Object is a partition object, a subcorpus of the
subcorpus is generated.
An object of the S4 class partition.
Andreas Blaette
To learn about the methods available for objects of the class
partition, see partition_class,
use("polmineR")
spd <- partition("GERMAPARLMINI", party = "SPD", interjection = "speech")
kauder <- partition("GERMAPARLMINI", speaker = "Volker Kauder", p_attribute = "word")
merkel <- partition("GERMAPARLMINI", speaker = ".*Merkel", p_attribute = "word", regex = TRUE)
s_attributes(merkel, "date")
s_attributes(merkel, "speaker")
merkel <- partition(
"GERMAPARLMINI", speaker = "Angela Dorothea Merkel",
date = "2009-11-10", interjection = "speech", p_attribute = "word"
)
merkel <- subset(merkel, !word %in% punctuation)
merkel <- subset(merkel, !word %in% tm::stopwords("de"))
# a certain defined time segment
days <- seq(
from = as.Date("2009-10-28"),
to = as.Date("2009-11-11"),
by = "1 day"
)
period <- partition("GERMAPARLMINI", date = days)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.