split_by_doc_id: Turn a kRp.corpus object into a list of kRp.text objects
In unDocUMeantIt/tm.plugin.koRpus: Full Corpus Support for the 'koRpus' Package

Description Usage Arguments Value Examples

For some analysis steps it might be important to have individual tagged texts instead of one large corpus object. This method produces just that.

1 2	## S4 method for signature 'kRp.corpus' split_by_doc_id(obj, keepFeatures = TRUE)

`obj`	An object of class `kRp.corpus`.
`keepFeatures`	Either logical, whether to keep all features or drop them, or a character vector of names of features to keep if present.

A named list of objects of class kRp.text. Elements are named by their doc_id.

# use readCorpus() to create an object of class kRp.corpus
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  myCorpus <- readCorpus(
    dir=file.path(path.package("tm.plugin.koRpus"), "examples", "corpus"),
    hierarchy=list(
      Topic=c(
        Winner="Reality Winner",
        Edwards="Natalie Edwards"
      ),
      Source=c(
        Wikipedia_prev="Wikipedia (old)",
        Wikipedia_new="Wikipedia (new)"
      )
    ),
    # use tokenize() so examples run without a TreeTagger installation
    tagger="tokenize",
    lang="en"
  )

  myCorpusList <- split_by_doc_id(myCorpus)
} else {}