Description Details Slots Contructor function Note Examples
Objects of this class can contain full text corpora in a hierachical structure. It supports both the tm
package's
Corpus
class and koRpus
' own object classes and stores them in separated slots.
Objects should be created using the readCorpus
function.
lang
A character string, naming the language that is assumed for the tokenized texts in this object.
desc
A named list of descriptive statistics of the tagged texts.
meta
A named list. Can be used to store meta information. Currently, no particular format is defined.
raw
A list of objects of class Corpus
.
tokens
A data frame as used for the tokens
slot in objects of class kRp.text
. In addition to the columns
usually found in those objects,
this data frame also has a factor column for each hierarchical category defined (if any).
features
A named logical vector,
indicating which features are available in this object's feat_list
slot.
Common features are listed in the description of the feat_list
slot.
feat_list
A named list with optional analysis results or other content as used by the defined features
:
hierarchy
A named list of named character vectors describing the directory hierarchy level by level.
hyphen
A named list of objects of class kRp.hyphen
.
readability
A named list of objects of class kRp.readability
.
lex_div
A named list of objects of class kRp.TTR
.
freq
The freq.analysis
slot of a kRp.txt.freq
class object after
freq.analysis
was called.
corp_freq
An object of class kRp.corp.freq
,
e.g., results of a call to
read.corp.custom
.
diff
A named list of diff
features of a kRp.text
object after
a method like textTransform
was called.
summary
A summary data frame for the full corpus,
including descriptive statistics on all texts, as well as
results of analyses like readability and lexical diversity, if available.
doc_term_matrix
A sparse document-term matrix,
as produced by docTermMatrix
.
stopwords
A numeric vector with the total number of stopwords in each text,
if stopwords were analyzed during tokenizing or POS tagging.
See the getter and setter methods
for easy access to these sub-slots.
There can actually be any number of additional features,
the above is just a list of those already defined by this package.
Should you need to manually generate objects of this class (which should rarely be the case),
the contructor function
kRp.corpus(...)
can be used instead of
new("kRp.corpus", ...)
. Whenever possible, stick to
readCorpus
.
There is also getter and setter methods
for objects of this class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | # use readCorpus() to create an object of class kRp.corpus
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
myCorpus <- readCorpus(
dir=file.path(path.package("tm.plugin.koRpus"), "examples", "corpus"),
hierarchy=list(
Topic=c(
Winner="Reality Winner",
Edwards="Natalie Edwards"
),
Source=c(
Wikipedia_prev="Wikipedia (old)",
Wikipedia_new="Wikipedia (new)"
)
),
# use tokenize() so examples run without a TreeTagger installation
tagger="tokenize",
lang="en"
)
} else {}
# manual creation
emptyCorpus <- kRp.corpus()
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.