partition | R Documentation |
Create a subcorpus and keep it in an object of the partition
class. If
defined, counts are performed for the p-attribute defined by the parameter
p_attribute
.
partition(.Object, ...)
## S4 method for signature 'corpus'
partition(
.Object,
def = NULL,
name = "",
encoding = NULL,
p_attribute = NULL,
regex = FALSE,
xml = slot(.Object, "xml"),
decode = TRUE,
type = get_type(.Object),
mc = FALSE,
verbose = TRUE,
...
)
## S4 method for signature 'character'
partition(
.Object,
def = NULL,
name = "",
encoding = NULL,
p_attribute = NULL,
regex = FALSE,
decode = TRUE,
type = get_type(.Object),
mc = FALSE,
verbose = TRUE,
...
)
## S4 method for signature 'environment'
partition(.Object, slots = c("name", "corpus", "size", "p_attribute"))
## S4 method for signature 'partition'
partition(
.Object,
def = NULL,
name = "",
regex = FALSE,
p_attribute = NULL,
decode = TRUE,
xml = NULL,
verbose = TRUE,
mc = FALSE,
...
)
## S4 method for signature 'context'
partition(.Object, node = TRUE)
## S4 method for signature 'remote_corpus'
partition(.Object, ...)
## S4 method for signature 'remote_partition'
partition(.Object, ...)
.Object |
A length-one character-vector, the CWB corpus to be used. |
... |
Arguments to define partition (see examples). If |
def |
A named list of character vectors of s-attribute values, the names are the s-attributes (see details and examples) |
name |
A name for the new |
encoding |
The encoding of the corpus (typically "LATIN1 or "(UTF-8)), if NULL, the encoding provided in the registry file of the corpus (charset="...") will be used. |
p_attribute |
The p-attribute(s) for which a count is performed. |
regex |
A logical value (defaults to FALSE). |
xml |
Either 'flat' (default) or 'nested'. |
decode |
Logical, whether to turn token ids to strings (set FALSE to minimize object size / memory consumption) in data.table with counts. |
type |
A length-one character vector specifying the type of corpus / partition (e.g. "plpr") |
mc |
Whether to use multicore (for counting terms). |
verbose |
Logical, whether to be verbose. |
slots |
Object slots that will be reported columns of |
node |
A logical value, whether to include the node (i.e. query matches) in the region matrix
generated when creating a |
The function sets up a partition
object based on s-attribute values.
The s-attributes defining the partition can be passed in as a list, e.g.
list(interjection="speech", year = "2013")
, or directly (see
examples).
The s-attribute values defining the partition may use regular expressions. To
use regular expressions, set the parameter regex to TRUE
. Regular
expressions are passed into grep
, i.e. the regex syntax used in R
needs to be used (double backlashes etc.). If regex is FALSE
, the
length of the character vectors can be > 1, matching s-attributes are
identifies with the operator '%in%'.
The XML imported into the CWB may be "flat" or "nested". This needs to be
indicated with the parameter xml
(default is "flat"). If you generate
a partition
based on a flat XML structure, some performance gain may be
achieved when ordering the s-attributes with decreasingly restrictive
conditions. If you have a nested XML, it is mandatory that the order of the
s-attributes provided reflects the hierarchy of the XML: The top-level
elements need to be positioned at the beginning of the list with the
s-attributes, the the most restrictive elements at the end.
If p_attribute
is not NULL, a count of tokens in the corpus will be
performed and kept in the stat
-slot of the partition-object. The
length of the p_attribute
character vector may be 1 or more. If two or
more p-attributes are provided, The occurrence of combinations will be
counted. A typical scenario is to combine the p-attributes "word" or "lemma"
and "pos".
If .Object
is a length-one character vector, a subcorpus/partition
for the corpus defined be .Object
is generated.
If .Object
is an environment (typically .GlobalEnv
),
the partition
objects present in the environment are listed.
If .Object
is a partition
object, a subcorpus of the
subcorpus is generated.
An object of the S4 class partition
.
Andreas Blaette
To learn about the methods available for objects of the class
partition
, see partition_class
,
use("polmineR")
spd <- partition("GERMAPARLMINI", party = "SPD", interjection = "speech")
kauder <- partition("GERMAPARLMINI", speaker = "Volker Kauder", p_attribute = "word")
merkel <- partition("GERMAPARLMINI", speaker = ".*Merkel", p_attribute = "word", regex = TRUE)
s_attributes(merkel, "date")
s_attributes(merkel, "speaker")
merkel <- partition(
"GERMAPARLMINI", speaker = "Angela Dorothea Merkel",
date = "2009-11-10", interjection = "speech", p_attribute = "word"
)
merkel <- subset(merkel, !word %in% punctuation)
merkel <- subset(merkel, !word %in% tm::stopwords("de"))
# a certain defined time segment
days <- seq(
from = as.Date("2009-10-28"),
to = as.Date("2009-11-11"),
by = "1 day"
)
period <- partition("GERMAPARLMINI", date = days)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.