Description Usage Arguments Value Examples
For some analysis steps it might be important to have individual tagged texts instead of one large corpus object. This method produces just that.
1 2 | ## S4 method for signature 'kRp.corpus'
split_by_doc_id(obj, keepFeatures = TRUE)
|
obj |
An object of class |
keepFeatures |
Either logical, whether to keep all features or drop them, or a character vector of names of features to keep if present. |
A named list of objects of class kRp.text
.
Elements are named by their doc_id
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | # use readCorpus() to create an object of class kRp.corpus
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
myCorpus <- readCorpus(
dir=file.path(path.package("tm.plugin.koRpus"), "examples", "corpus"),
hierarchy=list(
Topic=c(
Winner="Reality Winner",
Edwards="Natalie Edwards"
),
Source=c(
Wikipedia_prev="Wikipedia (old)",
Wikipedia_new="Wikipedia (new)"
)
),
# use tokenize() so examples run without a TreeTagger installation
tagger="tokenize",
lang="en"
)
myCorpusList <- split_by_doc_id(myCorpus)
} else {}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.